.. -*-RST-*-
.. |VERSION| replace:: 0.1.6

========================
DATA FORMAT VALIDATION
========================

:Author:       Dr John A.R. Williams
:Contact:      J.A.R.Williams@jarw.org.uk
:date:         2010/04/08
:status:       Initial Public Release
:version:      |VERSION|
:copyright:    © 2010 J.A.R. Williams
:abstract:     DATA-FORMAT-VALIDATION is a library for Common Lisp providing a
     consistent regular interface for converting (and validating) external data
     (in the form of strings usually) into internal data types and
     for formatting internal data back into external presentable
     strings, all according to a conversion or type specification. 

.. meta::
   :keywords: Common Lisp

.. contents:: Table of Contents

.. |DFV| replace:: DATA-FORMAT-VALIDATION
.. |JARW| replace:: John A.R. Williams

Download and Installation
=========================

|DFV| together with this documentation can be downloaded from the git
repository at
`<git://github.com/willijar/cl-data-format-validation.git>` or from
<http://www.jarw.org.uk/lisp/cl-data-format-validation.tar.gz>. The
current release version is |VERSION|.

|DFV| comes with a system definition for 
`ASDF <http://www.cliki.net/asdf>`_ and is compiled and loaded in the usual
way. It depends upon `CL-PPCRE <http://weitz.de/cl-ppcre/>`_.

|DFV|  is made available under the terms of the GPL v3 license - see
the file ``LICENSE.txt`` for details.

Support
=======

For questions, bug reports, feature requests, improvements, or patches
please email <J.A.R.Williams@jarw.org.uk>.

The API
=======

generic function **parse-input** `specification value &key &allow-other-keys => object`
   Validate and parse user input according to
   specification, returning the validated object. Throws an invalid-input
   condition if input is invalid.  If specification is a list the first
   element specifies the actual validation method and the rest of the
   list are passed as keyword arguments to the specific method::

     (parse-input '(integer :min 0) input)

   will return the integer value from strin if it is >0, or signal and
   invalid-input error if not and::

     (parse-input '(member :type integer :set (1 5 7)) input)

   will return it only if it has a value in the set.

   The `use-value` restart may be used to provide substitute value if the
   input is invalid.

generic function **format-output** `specification value &key &allow-other-keys => string`
   Return a string representation of value formatted
   according to a specification. If specification is a list the first
   element specifies the actual validation method and the rest of the
   list are passed as keyword arguments to the specific method e.g.::

    (format-output '(date :fmt :rfc2822) (get-universal-time))
    >"Mon, 10 Jul 2006 15:43:45 +00"

generic function **equivalent** `specification input reference &key &allow-other-keys => boolean`
  Return true if the input and reference values can be consider
  equivalent according to the specification. The default is to test
  using **equal**.

generic function **parse-options** `spec options-list &optional allow-other-options => options`
  Parse an option list (alist of names and strings to be parsed)
  against a specification. The specification is a list of entries each
  of which lists the name, and optionally the type specification (to
  be used by **parse-input**) and the default value to be used if there
  is no entry in the options-list. The
  output is an alist of names and the parsed or default values. Options in
  `options-list` not in spec are not returned and will signal a correctable
  `unknown-option` error unless `allow-other-options` is true.

generic function **parse-arguments** `spec argument-string &optional allow-spaces => arguments`
  Parse a string of whitespace delimited arguments according to spec.
  The specification is a list of entries each
  of which lists the name, and optionally the type  specification (to
  be used by **parse-input**) and default values. The
  output is an alist of variable names and parsed values.
  If allow-spaces is true, last element can contain spaces
  (i.e. trailing spaces are not trimmed).

formatter function **eng** `os arg &optional colon-p at-p d padchar exponentchar`
  Formatter which outputs its numerical argument `arg` in engineering format
  to stream `os`.
  It takes arguments `d,padchar,exponentchar` where
  `d` is the number of decimal places to display after the decimal point
  `padchar` is the character to pad the start of the number
  `exponentchar` is the character to use to display between radix and exponent
  It also takes the : modifier which will cause it to output the exponent
  as an SI units prefix rather than a number.

  e.g. `(format nil \"~/eng/\" 35000) => \"35.00e+3\"`

formatter function **date** `os utime &optional colon-p at-p precision 6 timezone`
  Formatter which formats a universal time for output as a date and time

  Modifiers:

  - os: an output stream designator
  - arg: a universal time
  - colon-p: a generalised boolean (default false).
             If true use month and day names in date
  - at-p: a generalised boolean (default false) - if true print in yyyy-mm-dd
          (sortable) format rather than dd-mm-yyy
  - precision: what precision to print it to. 6 is to the second,
             7 includes timezone, a negative number counts backward.
  - timezone: an integer (default `*timezone*`).
            If nil no timezone used and time is in current timezone
            adjusted for daylight saving time.
        
  e.g. `(format nil \"~/date/\" (get-universal-time)) => \"19-03-2009 08:30\""`

function **join-strings** `strings &optional (separator #\space) => string`
  Return a new string by joining together the list of  `strings`,
  separating each string with a `separator` character or string

function **split-string** `string &key count delimiter remove-empty-subseqs => list`
  Split `string` along whitespace as defined by the sequence `delimiter`.
  Whitespace which causes a split is elided from the result.  The whole
  string will be split, unless `max` is provided, in which case the
  string will be split into this number of tokens at most, the last one
  containing the whole rest of the given `string`. If
  `remove-empty-subseqs` is true zero length entries are removed. This
  is similar to `split-sequence` however it only takes a string input and
  the delimiter may be a string.

Type Specifications
===================

A type specification is an S-expression composed of a symbol
specifying the particular conversion and a keyword argument list of
qualifiers. Specific methods of **parse-input** and **format-output**
are specialised on the conversion type symbol and take the remainder
of the S-expression as an argument list. Adding your own conversions
is simply a matter of providing appropriately specialised
methods. The intended semantics are that the if the output from
**format-output** is read back in using **parse-input** with thye same
type specifications then an equivalent object should result.

Many conversions take the `nil-allowed` argument which
convert an empty or all whitespace string to nil corresponding to a
null input, otherwise an empty string is considered invalid input.
Methods specialisations are provided for the following types:


**boolean** `&key`
  Converts typical user boolean values (e.g. "TRUE", "Y",  "0") into a
  boolean type. On output "TRUE" and "FALSE" are used.

**bit-vector** `&key`
  Converts between a string of 0 and 1s and a bit vector.

**date** `&key nil-allowed zone fmt` Uses the `parse-time` library of
  Jim Healy and Daniel Barlow to convert to internal universal time in
  specified timezone `zone` which to defaults to special variable
  `*timezone*` for output but to `nil` for parsing input. If `zone` is
  nil the time will be in the current timezone allowing for local
  daylight savings time - otherwise it is in the specified timezone,
  which will be written out.

  `fmt` is a keyword specifying the output format to be used as
  follows.

   A stand alone formatter of the same name is also provided.
  
  :RFC2822   - output as per RFC2822 for internet messages
  :SHORT     - output in a shorter format (same as :ISO)
  :TIME-ONLY - outputs time as hh:mm:ss
  :DATE-ONLY - outputs date as dd-mm-yyyy
  :ISO       - output as per ISO 8602 (default)

**dimensional-parameter** `&key padchar decimal-places tol`
  Converts between a string which includes units and normal scaling
  suffixes and a cons of the numerical value and the base units
  string. `padchar` and `decimal-places` are as per **eng**.

  A dimensional comparator is equivalent if the numerical values and
  the units are equivalent.
 
**eng** `&key units padchar decimal-places`
  Parse a number suffix
  with units. The standard engineering prefixes are assumed for the
  units (but with 'u' instead of 'µ'). The appropriatly scaled
  floating point value is returned and if the `units`. If `units` is a
  string then the input units suffix must match. On output the number
  will be scaled and the appropriate engineering prefix used.
  A general purpose formatter of the same name is also provided.

**filename** `&key if-invalid replacement`
  Return a safe filename from a string path value.
  May return an error or replace invalid characters with the specified
  replacement letter (default '-');

**headers** `&key stream skip-blanks-p field-specifications
  preserve-newlines-p termination-test if-no-specification`
  Parse or format internet message style headers. `parse-input` takes
  either a string or stream as the input value. 

  `field-specifications`
  is either an a-list by field name of giving the parse type
  specification to be applied recursively for that field or a function
  which returns the parse type specification and a `present-p` values
  in the usual way. `if-no-specification` specifies either a type
  specification to be used if the field is not found in
  `field-specifications`, `:error` for this case to be flagged as an
  error or `:ignore` to ignore fields without specifications.
  If defaults to `nil` i.e. value is passed through as a string
  without parsing.

  `skip-blanks-p` will allow the parser to skip leading blank lines on
  the input. `termination-test` is a test function which of one
  argument (a string - a line) which should return true if the
  argument terminates the headers - default tests for a zero length
  line. If `preserve-newlines-p` is true then continuation lines will
  keep their newline characters, otherwise the newlines and first
  continuation character are removed.

  `format-output` will write its output to `stream` if it is given,
  otherwise it will return a string containing the output headers.   

**integer** `&key min max nil-allowed radix format`
  Converts to an integer between `min` and `max` (inclusive, and if
  specified). `radix` specified the base (in the usual way). `format`
  specifies the format control string to be used for output.

**list** `&key separator type min-length max-length`
  Return a list of objects delimited by the given `separator`
  string. Each member is recursively checked the nested type
  (another type specification). If specified `min-length` and
  `max-length` specify the required length bounds. The type
  specification may be a list of type specifications applied to each
  element in turn or a single type specification applied to all
  elements (note there is an ambiguity if you specify a list of one
  symbol - in this it is taken as a conversion for the first element only).

**member** `&key type set test  key`
  Recursively uses `type` to convert string to internal object which
  is then checked for membership of the list `set` using `key` and
  `test`(default is equal allowing for string tests).

**nil** `&key`
  Return string unchanged.

**number** `&key min max nil-allowed format radix tol`
  Converts to a general number between `min` and `max` (inclusive, and if
  specified). `radix`
  specified the base (in the usual way). `format` specifies the format
  control string to be used for output. The `parse-number` library of
  Matthew Danish is used to do the conversion.

  `tol` is the tolerance to be used for **equivalence** testing - it
  can either be a multiplier applied to the reference value or a
  function of two arguments - the input and the reference value.

**pathname** `&key must-exist wild-allowed nil-allowed`
  Convert input to a pathname. If `wild-allowed` is true then the
  pathname is allowed to be wild, otherwise if `must-exist` is true
  then the pathname must correspond to an existing file (checked using
  probe-file.  

**pathnames** `&key must-exist wild-allowed nil-allowed`
  Return a list of pathnames delimited by ':', each checked as for **pathname**

**read** `&key multiplep type package`
  Uses the lisp reader with the current package set to
  `package`. `type` is a Common Lisp type against which the read
  object(s) is checked. If `multiplep` is true then read will be
  continually called until all characters are used up and the results
  are returned as a list. On output, if `multiplep` is true list of
  objects are separated by a space and written readably. 

**roman**
  Convert between roman numerals (up to 4000) and an integer

**string** `&key strip-return nil-allowed min-word-count max-word-count min-length max-length`
  Validates that the string is between `min-length` and `max-length`
  characters long (inclusive, and if specified) and the word count is
  between `min-word-count` and `max-word-count`. 
  Whitespace is trimmed from the returned string, and if
  `strip-return` is specified the RETURN characters are stripped from
  the string (useful when handling input from http forms).

**symbol** `&key nil-allowed package convert`
  Returns a symbol from the string interned into `package` (default
  is the keyword package). `conversion` is a function applied to the
  string before it is interned (default identity) which may for
  example be used to change case or map special characters.

**time-period** `&key`
  A time period in hours, minutes and (optionally) seconds is
  converted into an integer number of seconds. ':' is used as the
  delimiter between fields.

Conditions and Restarts
=======================

**invalid-format** 
  is signalled if the input doesn't meet the type specification. It has
  readers `invalid-format-value` and `invalid-format-reason`.

**use-value** 
  restart may be invoked to specify a result to be used if
  invalid-input is signalled.

**use-default**
  This restart is available for **parse-options** and
  **parse-arguments** and will result in a default specified value
  being used.


Acknowledgements
================

Matthew Danish for the parse-number library used and enclosed with
this.

Daniel Barlow and Jim Healey for the parse-time library.