README.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122

# Generalized Agile Transport System (GATS)

GATS is a data encoding format, as well as a set of libraries for reading and
writing data in that format.  It's a structured format reminiscent of a binary
json, but with some extra cool features.

 * All numbers (integers and floating point) are stored in an arbitrary
   precision format which takes up the least amount of space it can.
 * Unlike textual formats (like json, xml, etc.) floating point numbers are
   stored exactly, the exact number you write in is the exact number you read
   out.
 * All formats stored are machine independent, you can read and write them on
   any architecture, any endianness and they will always be the same.  No more
   encoding worries.
 * There are a number of data types you can use to make up data structures:
   * Dictionaries
   * Lists
   * Integers
   * Floats
   * Strings (binary 8-bit strings, perfect for UTF-8)
   * Booleans
   * Nulls

# License

GATS is distributed under a new-BSD type license.  This is fairly permissive,
but if you require other licensing options feel free to contact us.

# Uses

GATS was originally intended for use in internet protocols, and is great for
that purpose.  Used correctly it allows you to create efficient yet highly
malleable protocols with very little extra effort.

However, you can certainly use it for other purposes such as serializing data
for storage, and inter-process communication.  In fact, with the different
language bindings available, sometimes GATS may be one of the easier ways to
allow, say, a PHP web service to communicate with a custom C++ executable or
python program.

# Languages/Libraries Supported

At the moment we actively maintain libraries for C++, java, php, python, and C#.
Other languages and libraries are welcomed.  Here's a little info on each
target directory:

 * *c++-libbu++* - The original libgats implementation.  Works using libbu++
   data types and streams.  You need to have libbu++ and Xagasoft build in
   order to build this version.
 * *c++-qt* - A version written using Qt data types.  This version builds using
   qmake, so if you're using Qt you already have everything you need.  Also
   features handy signals & slots to make event driven networking even easier!
 * *java* - A library using the Java native interfaces, everything looks and
   works exactly how you would expect it should.  There is a Xagasoft Build
   script to build a jar file, but it's simple enough that a single javac
   command can build it all, or just import the code into your project directly.
   This java version has been used on desktops and android devices.
 * *php* - There are two libraries for working with php, the first defines a
   set of classes for fine control over the format, sometimes this is necessary
   as php's types are a little loose.  The second simply uses php native types
   like array() as data transport.  The second option is usually the much easier
   to use, but doesn't always get the encoding correct for all inputs.
 * *python* - These work like other serialization mechanisms in Python like
   pickle, json, shelve, and marshal.  It exposes the functions load, dump,
   loads, dumps, and also the handy helpers recv and send for working with
   sockets.  The Python implementation returns and transmits native Python
   data types, which makes life pretty easy.  To use this version simply copy
   gats.py to your project.
 * *cs-dotnet* - This implementation is written in C# and compiles against .NET
   version 4.0 or later (possibly earlier).  It takes advantage of standard
   .NET interfaces for container types so they function just like native
   Dictionaries and Lists.  The class layout is similar to other languages,
   specifically Java.  This implementation does slightly more buffering than
   some of the others, but it still wouldn't hurt to buffer your more volatile
   streams, like network streams.

# Basic Operation

The way GATS works is dictated by the format, so it works similarly in every
implementation, although they each have slightly different mechanics.  When
encoding GATS you always encode each object in it's own "GATS packet."  A GATS
packet has a very simple header which includes the size of the packet to make
parsing fast and efficient.

Each packet can contain a single root object.  It can be any type, but for most
protocols a dictionary is a great choice for the root object.

The format is designed to make it very easy to work with various encoding,
packing, and encryption systems.  The reader, by default, will skip all leading
zero bytes that come before a valid GATS packet, and will stop processing
precisely at the end of a valid GATS packet.

Skipping leading zeros makes it easy to work in environments where padding may
be required.  You can use the simplest of all padding schemes (pad with zeros)
and it will work seamlessly with GATS.

Since the reader always reads exactly the number of bytes it needs, it's very
easy to embed GATS packets in other streams, or read them sequentially as fast
as you can from a socket.

## A Note About Strings

All strings in GATS are simply sequences of 8-bit bytes.  There is no
overarching encoding that is dictated by the format.  When using GATS it is
good to specify how you are encoding your text data, we recommend Unicode.
There is a possibility that a future version of GATS will include a separate
Unicode String data type, but for now it's important to remember this.

For this reason, we also recommend making the keys in all dictionaries 7-bit
UTF-8 compatible ASCII/Latin1.  This isn't required of course, but it makes
things a bit easier.

# Speed vs Size

GATS objects are, on average, smaller than storing in other binary formats, and
can be much smaller than textual formats by virtue of storing only as many
bytes as necessary for integers and floats.  This also means that GATS requires
more processing than fixed field binary formats, but interestingly not quite as
much as text formats like json.  The processing we do on floats is actually
roughly comparable in many ways to text processing, although with fewer steps.