Caching is coming together nicely, as well as the new nids system...or

whatever it'll be called later...
author: Mike Buland <eichlan@xagasoft.com> 2008-07-02 03:12:36 +0000
committer: Mike Buland <eichlan@xagasoft.com> 2008-07-02 03:12:36 +0000
commit: aa6471979556621151592e147be81ce940558e55 (patch)
tree: 487b0fade53903d32a6780fe285caa5de463a9eb /misc/raa_format.txt
parent: a153962ffe93e70f2419efeab904b515c99c2eda (diff)
download: libbu++-aa6471979556621151592e147be81ce940558e55.tar.gz
libbu++-aa6471979556621151592e147be81ce940558e55.tar.bz2
libbu++-aa6471979556621151592e147be81ce940558e55.tar.xz
libbu++-aa6471979556621151592e147be81ce940558e55.zip
1 files changed, 71 insertions, 0 deletions
diff --git a/misc/raa_format.txt b/misc/raa_format.txt
new file mode 100644
index 0000000..1b2c9f8
--- /dev/null
+++ b/misc/raa_format.txt
@@ -0,0 +1,71 @@
+Random Access Archive Format
+----------------------------
+This is the basic archive format for libbu++'s random access archive system.
+Unlike the traditional archive, given a unique key any object may be accessed
+at any time, and hopefully very quickly.
+To make this as portable as possible the basic interface to the RAA is very
+simple and seperated from any IO or formatting directly, any number af backends
+could be constructed quite simply, this is a description of the first of these
+formats.
+In order to make the system handle objects that are any size, and grow quickly
+and easily, I believe we should resort to a simple block allocation mechanism
+with uniform block sizes linked in chains, effectively accessed via "inodes."
+Therefore given a blocksize and a Table of Contents (TOC) any object should be
+easy to find.  The TOC can be implemented as an in-place hash table to minimize
+the amount of memory that must be sacraficed for very large, seldom used
+structures.
+The basic header:
+00-03: Magic Number, something cute, I dunno yet (encoding independant)
+04:    Byte count/order for standard indexes (8/4 for 64/32bit systems)
+       High order bit masked out determines endianness (1 = big, 0 = small)
+       The program will only accept one word-width for now, we can make it
+       adjustable later, or fix this at 4 bytes.
+05-08: Blocksize in bytes, could be anything, I don't think we need to worry
+       about OS related things for this, but it should be set intelligently by
+       the owner.  This includes the bytes reserved for a block header.
+09-12: Total block count, includes both used and unused blocks.
+13-16: Total used blocks, useful for determining when to expand.
+17-24: Reserved for flags and the like, should be all zeros for now.
+At this point are a number of "blocks" each with their own header and data are
+laid out sequentially, accessing one should be easy, seek to
+  (header size)+(block size)*(block number)
+The block header is as follows:
+00-03: First block in chain.  If this number matches the current block index
+       then this block is the first in a chain.
+04-07: Next block in chain.  If this number matches the current block index
+       then this block is the last in the chain.
+08-11: Prev block in chain.  If this number matches the current block index
+       then this block is the first in the chain.
+12-15: Number of bytes used/allocated remaining in the chain.  If this is the
+       first block, then this is the total size of the object accross all
+       blocks in the chain.  For the last block in the chain this will usually
+       be less than the available space.
+16-19: Reserved flagspace or something, make sure it's all zeros.
+20-xx: Block data, whatever you wanted to store.  At the moment this is going
+       to be (blocksize) - 20 for now, it will change if the block header
+       changes.
+Thus far we have described a generic system for storing dynamic "substreams" of
+data within a larger stream using a block-allocation system.  This is handy on
+it's own, and implemented as a seperate mechanism, but as handy as it is, it's
+not as useful without a table of contents, described here.
+Any above composite datastream that uses a TOC will have the TOC be the first
+block chain.  The TOC will initially be a basic in-place hash table, but may
+be changed to a b-tree depending on what kind of data is being used.  This basic
+table of contents simply links a generated UID from a program to the appropriate
+block chain.
+Systems like the above could be augmented with additional meta-data in order to
+create flexible, small, in-file file systems and the like.  For example,
+providing simple fixed-width data structures to tie to "inodes" (the program
+generated UIDs) you could have a mini posix filesystem in no time.
author	Mike Buland <eichlan@xagasoft.com>	2008-07-02 03:12:36 +0000
committer	Mike Buland <eichlan@xagasoft.com>	2008-07-02 03:12:36 +0000
commit	aa6471979556621151592e147be81ce940558e55 (patch)
tree	487b0fade53903d32a6780fe285caa5de463a9eb /misc/raa_format.txt
parent	a153962ffe93e70f2419efeab904b515c99c2eda (diff)
download	libbu++-aa6471979556621151592e147be81ce940558e55.tar.gz libbu++-aa6471979556621151592e147be81ce940558e55.tar.bz2 libbu++-aa6471979556621151592e147be81ce940558e55.tar.xz libbu++-aa6471979556621151592e147be81ce940558e55.zip

diff --git a/misc/raa_format.txt b/misc/raa_format.txt new file mode 100644 index 0000000..1b2c9f8 --- /dev/null +++ b/misc/raa_format.txt
@@ -0,0 +1,71 @@
	1	Random Access Archive Format
	2	----------------------------
	3
	4	This is the basic archive format for libbu++'s random access archive system.
	5	Unlike the traditional archive, given a unique key any object may be accessed
	6	at any time, and hopefully very quickly.
	7
	8	To make this as portable as possible the basic interface to the RAA is very
	9	simple and seperated from any IO or formatting directly, any number af backends
	10	could be constructed quite simply, this is a description of the first of these
	11	formats.
	12
	13	In order to make the system handle objects that are any size, and grow quickly
	14	and easily, I believe we should resort to a simple block allocation mechanism
	15	with uniform block sizes linked in chains, effectively accessed via "inodes."
	16
	17	Therefore given a blocksize and a Table of Contents (TOC) any object should be
	18	easy to find. The TOC can be implemented as an in-place hash table to minimize
	19	the amount of memory that must be sacraficed for very large, seldom used
	20	structures.
	21
	22	The basic header:
	23
	24	00-03: Magic Number, something cute, I dunno yet (encoding independant)
	25	04: Byte count/order for standard indexes (8/4 for 64/32bit systems)
	26	High order bit masked out determines endianness (1 = big, 0 = small)
	27	The program will only accept one word-width for now, we can make it
	28	adjustable later, or fix this at 4 bytes.
	29	05-08: Blocksize in bytes, could be anything, I don't think we need to worry
	30	about OS related things for this, but it should be set intelligently by
	31	the owner. This includes the bytes reserved for a block header.
	32	09-12: Total block count, includes both used and unused blocks.
	33	13-16: Total used blocks, useful for determining when to expand.
	34	17-24: Reserved for flags and the like, should be all zeros for now.
	35
	36	At this point are a number of "blocks" each with their own header and data are
	37	laid out sequentially, accessing one should be easy, seek to
	38	(header size)+(block size)*(block number)
	39
	40	The block header is as follows:
	41
	42	00-03: First block in chain. If this number matches the current block index
	43	then this block is the first in a chain.
	44	04-07: Next block in chain. If this number matches the current block index
	45	then this block is the last in the chain.
	46	08-11: Prev block in chain. If this number matches the current block index
	47	then this block is the first in the chain.
	48	12-15: Number of bytes used/allocated remaining in the chain. If this is the
	49	first block, then this is the total size of the object accross all
	50	blocks in the chain. For the last block in the chain this will usually
	51	be less than the available space.
	52	16-19: Reserved flagspace or something, make sure it's all zeros.
	53	20-xx: Block data, whatever you wanted to store. At the moment this is going
	54	to be (blocksize) - 20 for now, it will change if the block header
	55	changes.
	56
	57	Thus far we have described a generic system for storing dynamic "substreams" of
	58	data within a larger stream using a block-allocation system. This is handy on
	59	it's own, and implemented as a seperate mechanism, but as handy as it is, it's
	60	not as useful without a table of contents, described here.
	61
	62	Any above composite datastream that uses a TOC will have the TOC be the first
	63	block chain. The TOC will initially be a basic in-place hash table, but may
	64	be changed to a b-tree depending on what kind of data is being used. This basic
	65	table of contents simply links a generated UID from a program to the appropriate
	66	block chain.
	67
	68	Systems like the above could be augmented with additional meta-data in order to
	69	create flexible, small, in-file file systems and the like. For example,
	70	providing simple fixed-width data structures to tie to "inodes" (the program
	71	generated UIDs) you could have a mini posix filesystem in no time.