summaryrefslogtreecommitdiff
path: root/misc/raa_format.txt
diff options
context:
space:
mode:
authorMike Buland <eichlan@xagasoft.com>2008-07-02 03:12:36 +0000
committerMike Buland <eichlan@xagasoft.com>2008-07-02 03:12:36 +0000
commitaa6471979556621151592e147be81ce940558e55 (patch)
tree487b0fade53903d32a6780fe285caa5de463a9eb /misc/raa_format.txt
parenta153962ffe93e70f2419efeab904b515c99c2eda (diff)
downloadlibbu++-aa6471979556621151592e147be81ce940558e55.tar.gz
libbu++-aa6471979556621151592e147be81ce940558e55.tar.bz2
libbu++-aa6471979556621151592e147be81ce940558e55.tar.xz
libbu++-aa6471979556621151592e147be81ce940558e55.zip
Caching is coming together nicely, as well as the new nids system...or
whatever it'll be called later...
Diffstat (limited to '')
-rw-r--r--misc/raa_format.txt71
1 files changed, 71 insertions, 0 deletions
diff --git a/misc/raa_format.txt b/misc/raa_format.txt
new file mode 100644
index 0000000..1b2c9f8
--- /dev/null
+++ b/misc/raa_format.txt
@@ -0,0 +1,71 @@
1Random Access Archive Format
2----------------------------
3
4This is the basic archive format for libbu++'s random access archive system.
5Unlike the traditional archive, given a unique key any object may be accessed
6at any time, and hopefully very quickly.
7
8To make this as portable as possible the basic interface to the RAA is very
9simple and seperated from any IO or formatting directly, any number af backends
10could be constructed quite simply, this is a description of the first of these
11formats.
12
13In order to make the system handle objects that are any size, and grow quickly
14and easily, I believe we should resort to a simple block allocation mechanism
15with uniform block sizes linked in chains, effectively accessed via "inodes."
16
17Therefore given a blocksize and a Table of Contents (TOC) any object should be
18easy to find. The TOC can be implemented as an in-place hash table to minimize
19the amount of memory that must be sacraficed for very large, seldom used
20structures.
21
22The basic header:
23
2400-03: Magic Number, something cute, I dunno yet (encoding independant)
2504: Byte count/order for standard indexes (8/4 for 64/32bit systems)
26 High order bit masked out determines endianness (1 = big, 0 = small)
27 The program will only accept one word-width for now, we can make it
28 adjustable later, or fix this at 4 bytes.
2905-08: Blocksize in bytes, could be anything, I don't think we need to worry
30 about OS related things for this, but it should be set intelligently by
31 the owner. This includes the bytes reserved for a block header.
3209-12: Total block count, includes both used and unused blocks.
3313-16: Total used blocks, useful for determining when to expand.
3417-24: Reserved for flags and the like, should be all zeros for now.
35
36At this point are a number of "blocks" each with their own header and data are
37laid out sequentially, accessing one should be easy, seek to
38 (header size)+(block size)*(block number)
39
40The block header is as follows:
41
4200-03: First block in chain. If this number matches the current block index
43 then this block is the first in a chain.
4404-07: Next block in chain. If this number matches the current block index
45 then this block is the last in the chain.
4608-11: Prev block in chain. If this number matches the current block index
47 then this block is the first in the chain.
4812-15: Number of bytes used/allocated remaining in the chain. If this is the
49 first block, then this is the total size of the object accross all
50 blocks in the chain. For the last block in the chain this will usually
51 be less than the available space.
5216-19: Reserved flagspace or something, make sure it's all zeros.
5320-xx: Block data, whatever you wanted to store. At the moment this is going
54 to be (blocksize) - 20 for now, it will change if the block header
55 changes.
56
57Thus far we have described a generic system for storing dynamic "substreams" of
58data within a larger stream using a block-allocation system. This is handy on
59it's own, and implemented as a seperate mechanism, but as handy as it is, it's
60not as useful without a table of contents, described here.
61
62Any above composite datastream that uses a TOC will have the TOC be the first
63block chain. The TOC will initially be a basic in-place hash table, but may
64be changed to a b-tree depending on what kind of data is being used. This basic
65table of contents simply links a generated UID from a program to the appropriate
66block chain.
67
68Systems like the above could be augmented with additional meta-data in order to
69create flexible, small, in-file file systems and the like. For example,
70providing simple fixed-width data structures to tie to "inodes" (the program
71generated UIDs) you could have a mini posix filesystem in no time.