ABNFGEN(1)                                             ABNFGEN(1)

NAME
       abnfgen - ABNF-based test case generator

SYNOPSIS
       abnfgen [ -7hvclux ] [ -d output-directory ]
               [ -y depth ] [ -n ncases ]
               [ -p filename-pattern ] [ -w prefix ] [ -r seed ]
               [ -s start-symbol ] [ -t tentative-file ]
               [ files...  ]

DESCRIPTION
       ABNF ("Advanced Backus Naur Form") grammars are frequently
       used in RFCs to describe protocols or presentation formats
       for Internet Standards.  The abnfgen program produces text
       composed according to the rules of an ABNF grammar,  given
       the grammar.   Such text can then be used to test a parser
       that claims to implement the grammar.

OPTIONS
       -h     Print a brief usage.

       -v     Verbose mode.  Print a trace of the expanded  gram-
              mar rules to standard error.

       -w     Write seed.  Begin each generated file with prefix,
              followed by the seed as a decimal number and a new-
              line.

       -c     Try for complete coverage.   Rather  than  randomly
              picking  productions,  try  to  cover each leaf and
              branch of the grammar.

              For repetitions with * where  maximum  and  minimum
              are  less than 100 iterations apart, the repetition
              counts as fully covered when both maximum and mini-
              mum have been produced.  (The stages in between are
              only tried once the extremes have been covered.)

              For repetitions with * where  maximum  and  minimum
              are  more than 100 apart, only the minimum and some
              small number of repetitions are tried; doubling the
              -c removes that limit and causes the extremes to be
              tried even if the maximum is very large.

              For character ranges with up to 256 elements,  each
              elements  must  be produced for full coverage.  For
              (Unicode) character ranges with more than 256  ele-
              ments,  every possible value modulo 256 is produced
              at least once for complete coverage.

              A case-insensitive strings counts as fully  covered
              once  both  an  all-uppercase  and an all-lowercase
              version have been produced.

       -y     Descend freely through at  most  depth  nonterminal
              expansions  or  repetitions.   After recursing that
              deeply, always pick that branch of the grammar that
              terminates most quickly.  This can be used to limit
              expansion in grammars likely to recurse  infinitely
              such as a = a a a a | "x".
              The default depth is 100.

       -n     Rather  than  generating  output on standard output
              (the default), generate  ncases  test  files  named
              ####.tst,  where  ####  is replaced with the test's
              running number from 1 to ncases.

       -p     Name test cases using filename-pattern.  A sequence
              of # characters in the pattern is replaced with the
              running number of the test, padded to the specified
              size.

       -r     Initialize  the  random  generator using seed.  For
              the same grammar and version of the  software,  the
              same seed always creates the same subtree.

       -s     Start  production with start-symbol.  (Default: the
              first nonterminal defined in the grammar file.)

       -t     Read the nonterminals defined in tentative-file and
              use  them  if they don't get defined in the regular
              input files.

       -u     Reject any grammar that contains <>-enclosed prose.

       -x     Exclude  the  core set of definitions.  The RFC that
              defines the  grammar  also  defines  a  core  set of
              symbols  like  CRLF, DIGIT, SP  and so on.     Since
              version 0.9,  abnfgen predefines these symbols as if
              their  definition   had  been  passed  in  with  -t.
              Specify  -x to suppress those predefinitions.

       -7     Disable RFC 7405: do not interpret '%i"foo"' and
              '%s"foo"' as case-insensitive or case-sensitive lit-
              erals, respectively.

       -l     ("legal")     Disable extensions to RFC 4234: do not
              allow case-sensitive literals in single quotes,   do
	      not allow branch tagging with {}, and do not convert
	      character constants from Unicode to UTF-8.

	-_    Allow "_" in identifier names.
	      ABNF doesn't (it allows "-" instead), but many other
	      grammar systems do.


GRAMMAR
       The input grammar is a slight extension of ABNF with  pro-
       visions  for  literal strings and control of the chance of
       descent into specific branches of the grammar.


       nonterminal = expression
              The nonterminal expands to expression.

       nonterminal =/ expression
              Alternative to whatever else has been defined,  the
              nonterminal can also expand to expression.

       x / y  Either x or y.

       x y    X followed by y.

       "abc"  The  case-insensitive  string abc.  That is, one of
              ABC, ABc, AbC, Abc, aBC, aBc, abC, or  abc.   There
              is no way of quoting " in a string; in a pinch, use
              <"> or %x22.

       'abc'  The case-sensitive string abc.  This is  an  exten-
              sion  to  ABNF,  and can be  disabled  using the -l
	      option.     In  ABNF, 'abc' must be specified using
	      the  ascii  values  of  the  characters,   e.g.  as
	      %x61.62.63.   There  is  no  way  of quoting ' in a
              string; use '"'"'.  (That  is,  leave  the  single-
              quoted  string,  enter  a double-quoted string, the
              single quote, leave the double-quoted string, reen-
              ter the single-quoted string.)

       %zN    The  character  with  base z value N.  Bases are: x
              (16),  d (10), and b (2).

       %zM-N  Any character with values betwen M and N inclusive,
              base z.

       M*N expression
              Between  (inclusive) M and N repetitions of expres-
              sion.  The default for M is 0, for N infinity.

       [expression]
              Same  as  0*1(expression),  that  is,  an  optional
              expression.

       (expression)
              Expands to expression.  (Use parentheses for group-
              ing.)

BRANCH CONTROL
       Each alternative in the grammar has a weight  assigned  to
       it.   Normally,  these  weights are all 1.  The higher the
       weight, the more likely it is that a alternative  is  used
       when  expanding the nonterminal it occurs in.  Assign spe-
       cific weights to an alternative by prefixing it with  {N},
       where N is an integer.  For example
              nt = {30} nt x / x
              x = {2} 'a' / {3} 'b'
       should  tend  to  produce  output that is about 2/5th 'a',
       about 3/5th 'b', and fairly lengthy.

       Repetition can be branch controlled with two chance param-
       eters that govern whether to stop or continue on each step
       of the repetition.  They're placed before the  two  counts
       around the *:
              nt = {1}1*{3}10 'X'
       will  tend  to produce somewhere  around 3 Xs  on average.
       This is an extension to RFC 4234 and can be disabled by
       specifying -l on the  command line.


BUGS
       Please  send problems, bugs, questions, desirable enhance-
       ments, etc. to:

              jutta@pobox.com


                           4 March 2002                ABNFGEN(1)