Trivial UTF-8 is a small library for doing UTF-8-based in- and output on a Lisp implementation that already supports Unicode -- meaning char-code and code-char deal with Unicode character codes.
The rationale for the existence of this library is that while Unicode-enabled implementations usually do provide some kind of interface to dealing with character encodings, these are typically not terribly flexible or uniform.
The Babel library solves a similar problem while understanding more encodings. Trivial UTF-8 was written before Babel existed, but for new projects you might be better off going with Babel. The one plus that Trivial UTF-8 has is that it doesn't depend on any other libraries.
Trivial-utf-8 is released under a BSD-style license (see source file). The latest release can be downloaded from http://common-lisp.net/project/trivial-utf-8/trivial-utf-8.tgz, or installed with asdf-install.
A darcs repository with the most recent changes can be checked out with:
> darcs get http://common-lisp.net/project/trivial-utf-8/darcs/trivial-utf-8
Or look at it online.
The trivial-utf-8-devel mailing list can be used for any questions, discussion, bug-reports, patches, or anything else relating to this library. Or mail the author/maintainer directly: Marijn Haverbeke.
function string-to-utf-8-bytes (string) => array of (unsigned-byte 8)
Convert a string into an array of unsigned bytes containing its utf-8 representation.
function utf-8-bytes-to-string (bytes) => string
Convert a byte array containing utf-8 encoded characters into the string it encodes.
function write-utf-8-bytes (string output &key null-terminate)
Write a string to a byte-stream, encoding it as utf-8.
function read-utf-8-string (input &key null-terminated stop-at-eof char-length byte-length)
Read utf-8 encoded data from a byte stream and
construct a string with the characters found. When
null-terminated
is given it will stop reading at a null
character, stop-at-eof
tells it to stop at the end of
file without raising an error, and the char-length
and
byte-length
parameters can be used to specify the maximum
amount of characters or bytes to read.
function utf-8-byte-length (string) => integer
Calculate the amount of bytes needed to encode a string.
function utf-8-group-size (byte) => integer
Determine the amount of bytes that are part of the character starting with a given byte.
condition utf-8-decoding-error
A condition of this type is raised whenever an incorrectly encoded character is encountered.
Back to Common-lisp.net.