aboutsummaryrefslogtreecommitdiff

Generalized Agile Transport System (GATS)

GATS is a data encoding format, as well as a set of libraries for reading and writing data in that format. It's a structured format reminiscent of a binary json, but with some extra cool features.

  • All numbers (integers and floating point) are stored in an arbitrary precision format which takes up the least amount of space it can.
  • Unlike textual formats (like json, xml, etc.) floating point numbers are stored exactly, the exact number you write in is the exact number you read out.
  • All formats stored are machine independent, you can read and write them on any architecture, any endianness and they will always be the same. No more encoding worries.
  • There are a number of data types you can use to make up data structures:
  • Dictionaries
  • Lists
  • Integers
  • Floats
  • Strings (binary 8-bit strings, perfect for UTF-8)
  • Booleans
  • Nulls

License

GATS is distributed under a new-BSD type license. This is fairly permissive, but if you require other licensing options feel free to contact us.

Uses

GATS was originally intended for use in internet protocols, and is great for that purpose. Used correctly it allows you to create efficient yet highly malleable protocols with very little extra effort.

However, you can certainly use it for other purposes such as serializing data for storage, and inter-process communication. In fact, with the different language bindings available, sometimes GATS may be one of the easier ways to allow, say, a PHP web service to communicate with a custom C++ executable or python program.

Languages/Libraries Supported

At the moment we actively maintain libraries for C++, java, php, python, and C#. Other languages and libraries are welcomed. Here's a little info on each target directory:

  • c++-libbu++ - The original libgats implementation. Works using libbu++ data types and streams. You need to have libbu++ and Xagasoft build in order to build this version.
  • c++-qt - A version written using Qt data types. This version builds using qmake, so if you're using Qt you already have everything you need. Also features handy signals & slots to make event driven networking even easier!
  • java - A library using the Java native interfaces, everything looks and works exactly how you would expect it should. There is a Xagasoft Build script to build a jar file, but it's simple enough that a single javac command can build it all, or just import the code into your project directly. This java version has been used on desktops and android devices.
  • php - There are two libraries for working with php, the first defines a set of classes for fine control over the format, sometimes this is necessary as php's types are a little loose. The second simply uses php native types like array() as data transport. The second option is usually the much easier to use, but doesn't always get the encoding correct for all inputs.
  • python - These work like other serialization mechanisms in Python like pickle, json, shelve, and marshal. It exposes the functions load, dump, loads, dumps, and also the handy helpers recv and send for working with sockets. The Python implementation returns and transmits native Python data types, which makes life pretty easy. To use this version simply copy gats.py to your project.
  • cs-dotnet - This implementation is written in C# and compiles against .NET version 4.0 or later (possibly earlier). It takes advantage of standard .NET interfaces for container types so they function just like native Dictionaries and Lists. The class layout is similar to other languages, specifically Java. This implementation does slightly more buffering than some of the others, but it still wouldn't hurt to buffer your more volatile streams, like network streams.

Basic Operation

The way GATS works is dictated by the format, so it works similarly in every implementation, although they each have slightly different mechanics. When encoding GATS you always encode each object in it's own "GATS packet." A GATS packet has a very simple header which includes the size of the packet to make parsing fast and efficient.

Each packet can contain a single root object. It can be any type, but for most protocols a dictionary is a great choice for the root object.

The format is designed to make it very easy to work with various encoding, packing, and encryption systems. The reader, by default, will skip all leading zero bytes that come before a valid GATS packet, and will stop processing precisely at the end of a valid GATS packet.

Skipping leading zeros makes it easy to work in environments where padding may be required. You can use the simplest of all padding schemes (pad with zeros) and it will work seamlessly with GATS.

Since the reader always reads exactly the number of bytes it needs, it's very easy to embed GATS packets in other streams, or read them sequentially as fast as you can from a socket.

A Note About Strings

All strings in GATS are simply sequences of 8-bit bytes. There is no overarching encoding that is dictated by the format. When using GATS it is good to specify how you are encoding your text data, we recommend Unicode. There is a possibility that a future version of GATS will include a separate Unicode String data type, but for now it's important to remember this.

For this reason, we also recommend making the keys in all dictionaries 7-bit UTF-8 compatible ASCII/Latin1. This isn't required of course, but it makes things a bit easier.

Speed vs Size

GATS objects are, on average, smaller than storing in other binary formats, and can be much smaller than textual formats by virtue of storing only as many bytes as necessary for integers and floats. This also means that GATS requires more processing than fixed field binary formats, but interestingly not quite as much as text formats like json. The processing we do on floats is actually roughly comparable in many ways to text processing, although with fewer steps.