diff options
Diffstat (limited to 'misc/raa_format.txt')
-rw-r--r-- | misc/raa_format.txt | 71 |
1 files changed, 71 insertions, 0 deletions
diff --git a/misc/raa_format.txt b/misc/raa_format.txt new file mode 100644 index 0000000..1b2c9f8 --- /dev/null +++ b/misc/raa_format.txt | |||
@@ -0,0 +1,71 @@ | |||
1 | Random Access Archive Format | ||
2 | ---------------------------- | ||
3 | |||
4 | This is the basic archive format for libbu++'s random access archive system. | ||
5 | Unlike the traditional archive, given a unique key any object may be accessed | ||
6 | at any time, and hopefully very quickly. | ||
7 | |||
8 | To make this as portable as possible the basic interface to the RAA is very | ||
9 | simple and seperated from any IO or formatting directly, any number af backends | ||
10 | could be constructed quite simply, this is a description of the first of these | ||
11 | formats. | ||
12 | |||
13 | In order to make the system handle objects that are any size, and grow quickly | ||
14 | and easily, I believe we should resort to a simple block allocation mechanism | ||
15 | with uniform block sizes linked in chains, effectively accessed via "inodes." | ||
16 | |||
17 | Therefore given a blocksize and a Table of Contents (TOC) any object should be | ||
18 | easy to find. The TOC can be implemented as an in-place hash table to minimize | ||
19 | the amount of memory that must be sacraficed for very large, seldom used | ||
20 | structures. | ||
21 | |||
22 | The basic header: | ||
23 | |||
24 | 00-03: Magic Number, something cute, I dunno yet (encoding independant) | ||
25 | 04: Byte count/order for standard indexes (8/4 for 64/32bit systems) | ||
26 | High order bit masked out determines endianness (1 = big, 0 = small) | ||
27 | The program will only accept one word-width for now, we can make it | ||
28 | adjustable later, or fix this at 4 bytes. | ||
29 | 05-08: Blocksize in bytes, could be anything, I don't think we need to worry | ||
30 | about OS related things for this, but it should be set intelligently by | ||
31 | the owner. This includes the bytes reserved for a block header. | ||
32 | 09-12: Total block count, includes both used and unused blocks. | ||
33 | 13-16: Total used blocks, useful for determining when to expand. | ||
34 | 17-24: Reserved for flags and the like, should be all zeros for now. | ||
35 | |||
36 | At this point are a number of "blocks" each with their own header and data are | ||
37 | laid out sequentially, accessing one should be easy, seek to | ||
38 | (header size)+(block size)*(block number) | ||
39 | |||
40 | The block header is as follows: | ||
41 | |||
42 | 00-03: First block in chain. If this number matches the current block index | ||
43 | then this block is the first in a chain. | ||
44 | 04-07: Next block in chain. If this number matches the current block index | ||
45 | then this block is the last in the chain. | ||
46 | 08-11: Prev block in chain. If this number matches the current block index | ||
47 | then this block is the first in the chain. | ||
48 | 12-15: Number of bytes used/allocated remaining in the chain. If this is the | ||
49 | first block, then this is the total size of the object accross all | ||
50 | blocks in the chain. For the last block in the chain this will usually | ||
51 | be less than the available space. | ||
52 | 16-19: Reserved flagspace or something, make sure it's all zeros. | ||
53 | 20-xx: Block data, whatever you wanted to store. At the moment this is going | ||
54 | to be (blocksize) - 20 for now, it will change if the block header | ||
55 | changes. | ||
56 | |||
57 | Thus far we have described a generic system for storing dynamic "substreams" of | ||
58 | data within a larger stream using a block-allocation system. This is handy on | ||
59 | it's own, and implemented as a seperate mechanism, but as handy as it is, it's | ||
60 | not as useful without a table of contents, described here. | ||
61 | |||
62 | Any above composite datastream that uses a TOC will have the TOC be the first | ||
63 | block chain. The TOC will initially be a basic in-place hash table, but may | ||
64 | be changed to a b-tree depending on what kind of data is being used. This basic | ||
65 | table of contents simply links a generated UID from a program to the appropriate | ||
66 | block chain. | ||
67 | |||
68 | Systems like the above could be augmented with additional meta-data in order to | ||
69 | create flexible, small, in-file file systems and the like. For example, | ||
70 | providing simple fixed-width data structures to tie to "inodes" (the program | ||
71 | generated UIDs) you could have a mini posix filesystem in no time. | ||