quut.com yacc2html

yacc2html - convert a YACC grammar to HTML

A visitor of my C subtree asked me in email how I'd done the markup for the yacc and lex grammars. Had I used a tool that I could lend him to annotate his own grammars?

No, but it's a neat idea; so I wrote one.

Here's the output of yacc2html invoked on its own grammar. If you want to test it yourself, you're welcome to a gzipped tar-file of the beta release.

The version was updated Jan 5th, 1999. Thanks to everybody who reported bugs with the source and the site.

Synopsis

yacc2html [ -b basename ] [ -c ] [ -D name [=url ] ] [ -h title ] [ -H ]
[ -i ] [ -l lex-file ] [ -L lex-sed-file ] [ -N nonterminal-format ]
[ -p ] [ -r ] [ -t ] [ -T terminal-format ] [ -u ] [ -y yacc-file ]
[ -Y yacc-sed-file ] [ filename ]

Description

yacc2html converts input specifications for yacc(1) (and some others) into HTLM format. The visual appearance of the grammar is kept through HTML <pre> </pre> preformatted quoting, but references to nonterminals are linked to their definition, references to tokens are linked (by default) to an outside lex file, references to token types (usually enclosed in angle brackets) can be linked to their %union definition, and the whole grammar acquires a HTML header and footer.

Further command-line options allow to strip all C code from the output, to link arbitrary tokens and types to arbitrary URLs (uniform resource locators), and to write two input files suitable for processing with sed(1) which can be used to turn a lex specification into a HTML file and to add links to the yacc hypertext to arbitrary other HTML documents.

The yacc specification is read from standard input, or from a file given as a command line argument; the HTML result is written to standard output.

Input Format

In addition to the input format expected by yacc, yacc2html accepts:

These two extensions allow to use yacc2html even on ``informal'' grammars, such as
   list:    foo? bar+ (baz|quux) ';'
even though yacc itself would not understand, or misinterpret, them.

Nonterminals and Terminals

Symbols that appear in front of a colon (:) are considered to be nonterminals. Nonterminals are tagged with an <a name="nonterminal"> anchor. The generated nonterminal names are affected by the -N option; see below.  There can be multiple defining rules for a nonterminal; only the first is anchored.  If a nonterminal occurs outside of a rule that defines it, it is tagged with an <a href="#nonterminal"> reference to the first defining rule. Nonterminals in rules that define them are not tagged by default; this behavior is toggled by the -r (recursive-nonterminals) option.

Symbols that appear behind a yacc %token declaration are considered to be terminals. By default, terminals are tagged with an reference to their first defining rule in an external lex-file: <a href="lex-file.html#terminal">. The -t option makes yacc2html link terminals to their %token declaration rather than to an external file. The -i option suppresses linking of terminals altogether.

Sed scripts

The scripts, appropriate for handing to sed with the -f filename option, can be written using the -L output and Y output options. The first of these turns an input file for lex into HTML with anchors suitable for referencing from the converted yacc input file. The second turns occurrences of nonterminals in arbitrary input text into references to the corresponding anchor in the converted yacc input file.

Thus, given lex and yacc input files grammar.l and grammar.y, the two steps for turning them into HTML files are

   % yacc2html -L script.sed grammar.y  > grammar-y.html
   % sed -f script.sed  grammar.l > grammar-l.html

Options

-H
(help)  Print a short online help message, and exit.

-bbasename
Use basename to default the names of the yacc and lex HTML files, rather than the name of the input file (or "stdin", if yacc2html is used as a filter).

-c
(c-code suppression)  Strip %{ ... %} sections, { ... } action statements, and trailing code after the second %% from the HTML file.  (The C-like comments within the yacc code are still written; I consider them part of the specification, not of the the C implementation.)

-Dsymbol[=URL]
Without a second argument, pretend that symbol is neither a token nor a nonterminal (but defined elsewhere); create neither an anchor nor links for it. When an argument is present, it specifies an URL that all occurrences of the nonterminal should be linked to. Regardless of whether or not a second argument is present, no defined symbol ever shows up in the sed files.

-htitle
(header title) Let the document title (as for the <title>...</title> HTML element) be title. If this option is not present, html2yacc defaults to the input file name, or *standard input* if input is read from standard input.

-i
(ignore terminals) Create neither anchors nor links to terminal tokens.

-Llex-sed-file
Write to lex-sed-file a sed script (suitable for use with sed's -ffilename option) to turn a lex file into HTML code. The script adds a header and footer to the text that is passed through it, quotes &, <, >, and ", and turns the first appearance of every word that yacc2html recognized as a terminal into an anchor.

-Nnonterminal-name-format
When generating a reference to, or a definition of, a nonterminal, use nonterminal-name-format to derive the local tag from the name of the nonterminal. In the format string, the following sequences are recognized:
%b
the argument of a -b option if specified, or the input file name without a suffix;
%l
The argument of a -l option if specified, or the defaulted HTML lex file name ("%b-l.html");
%s
The name of the nonterminal;
%y
The argument of a -y option if specified, or the defaulted HTML yacc file name ("%b- y.html").
Thus, to make all nonterminal references refer to a nonterminal NAME as "yacc-name" rather than just "name", use -Nyacc-%s.

-r
(recursive links) Link nonterminal references to nonterminal definitions, even when the nonterminal occurs in the defining rule. Normally, such `recursive' nonterminals are left unlinked.

-t
(token-terminals) Link terminals to their %token declaration, rather than to the lex file.

-Tterminal-url-format
When generating a reference to a terminal symbol, use terminal-url-format to derive the URL from the name of the terminal, rather than the default. In the format string, the following sequences are recognized:
%b
the argument of a -b option if specified, or the input file name without a suffix;
%l
The argument of a -l option if specified, or the defaulted HTML lex file name ("%b- l.html");
%s
The name of the terminal;
%y
The argument of a -y option if specified, or the defaulted HTML yacc file name ("%b- y.html").
Thus, to make all token references refer to a token NAME as "token-NAME" rather than just "NAME", use -T%l#token-%s.

-u
(%union links) Link type references (outside of C code) to the %union definition at the start of the grammar, if such a definition exists. Normally, token type references remain unlinked.

-Yyacc-sed-file
Write to yacc-sed-file a sed script (suitable for use with sed's -ffilename option) to turn all occurrences of words that yacc2html recognized as a nonterminals into links to the nonterminals' definitions in the HTML output file. The name of the HTML output file can be specified explicitly using -yyacc-file or defaults implicitly to %b-y.html.

See also

lex(1), sed(1), yacc(1)

Bugs

The anchors in the lex file shouldn't be created at the token name itself, but at the start of the paragraph that contains it.

The parser's error messages are not very helpful.

Please forward other bug reports to the author, Jutta Degener, jutta at pobox.com. Thanks!