Age | Commit message (Collapse) | Author |
|
Utf32 (le,be). The internal storage seems to be working fine, although we do
have a problem with random access, but at least we can tell which half of a
surrogate pair we're on, so we can always rapidly determine the entire code
point from any utf16 index that we're on.
The only optomization that I'm not doing yet is reading in entire 16bit or 32bit
words at a time and converting them from their byte order to native. There are
a few potential issues with that, so we'll see.
I added a couple of testing datafiles and a test program, I'll delete them all
just as soon as it's verified to write correctly.
|
|
encoding to make things easier (little endian in our case). It can currently
read utf8 and utf16be, but not BOM. It will give you full unicode code points
instead of the raw utf16 values, which is pretty slick.
|
|
|
|
use a Bu::String as it's backend storage, so we'll get all the great out of
that...
|
|
fstring, and updated the copyright notice to extend to 2011
|
|
that were using fstring, I hope.
|
|
Unicode handling we'll need to implement a series of codecs and converters as
well as tables of codepages and lookups. It'll be interesting, I guess, but
it makes me care a lot less about proper encoding. Anyway, UtfString uses
shorts instead of chars, so it's a step in the right direction, but still not
enough to be able to handle proper UTF-16 encoding, maybe UCS-2 encoding, but...
...that's lame. Bu::FBasicString has been generalized a bit with optimizations
from libc for char based strings. It also, unfortunately, still uses char-only
functions in several places, those all rely on char casting strings at the
moment just to get the thing to compile. Basically, it's not a good UTF-16
solution yet, and it may never be and remain compatible with char based strings.
|