A visitor of my C subtree asked me in email how I'd done the markup for the yacc and lex grammars. Had I used a tool that I could lend him to annotate his own grammars?
No, but it's a neat idea; so I wrote one.
Here's the output of yacc2html invoked on its own grammar. If you want to test it yourself, you're welcome to a gzipped tar-file of the beta release.
The version was last updated Dec 16th, 2012. (It should not actually compile with bison/flex.) Thanks to everybody who reported bugs with the source and the site.
yacc2html converts input specifications for yacc(1) (and
some others) into HTLM format. The visual appearance of
the grammar is kept through HTML <pre>
</pre>
preformatted
quoting, but references to nonterminals are linked to their
definition, references to tokens are linked (by default) to
an outside lex file, references to token types (usually
enclosed in angle brackets) can be linked to their %union
definition, and the whole grammar acquires a HTML header and
footer.
Further command-line options allow to strip all C code from the output, to link arbitrary tokens and types to arbitrary URLs (uniform resource locators), and to write two input files suitable for processing with sed(1) which can be used to turn a lex specification into a HTML file and to add links to the yacc hypertext to arbitrary other HTML documents.
The yacc specification is read from standard input, or from a file given as a command line argument; the HTML result is written to standard output.
In addition to the input format expected by yacc, yacc2html accepts:
list: foo? bar+ (baz|quux) ';'even though yacc itself would not understand, or misinterpret, them.
Symbols that appear in front of a colon (:) are considered
to be nonterminals. Nonterminals are tagged with an
<a name="nonterminal">
anchor.
The generated nonterminal names
are affected by the -N
option; see below.
There can be multiple defining rules for a nonterminal;
only the first is anchored.
If a nonterminal occurs outside of a rule that
defines it, it is tagged with an <a href="#nonterminal">
reference to the first defining rule. Nonterminals in rules
that define them are not tagged by default; this behavior is
toggled by the -r (recursive-nonterminals) option.
Symbols that appear behind a yacc %token
declaration are considered to be terminals.
By default, terminals are tagged with an reference
to their first defining rule in an external lex-file:
<a href="lex-file.html#terminal">
.
The -t
option makes yacc2html link terminals
to their %token
declaration rather than to an external file.
The -i
option suppresses linking of terminals altogether.
The scripts, appropriate for handing to sed with the
-f
filename option, can be written using the
-L
output
and
Y
output options.
The first of these turns an input file for lex into HTML
with anchors suitable for
referencing from the converted yacc input file.
The second
turns occurrences of nonterminals in arbitrary input text
into references to the corresponding anchor in the converted
yacc input file.
Thus, given lex and yacc input files grammar.l
and
grammar.y
, the two steps for turning them into HTML files
are
% yacc2html -L script.sed grammar.y > grammar-y.html % sed -f script.sed grammar.l > grammar-l.html
%{
... %}
sections, {
... }
action statements, and trailing code after the second
%%
from the HTML file. (The C-like comments within the
yacc code are still written; I consider them part of
the specification, not of the the C implementation.)
<title>
...</title>
HTML element)
be title.
If this option is not present, html2yacc defaults to the input
file name, or *standard input*
if input is read from
standard input.
&
, <
, >
, and "
,
and turns the first appearance of every word that
yacc2html recognized as a terminal into an anchor.
-b
option
if specified, or the input file name without a suffix;
%b-l.html
");
-Nyacc-%s
.
%token
declaration, rather than to the lex file.
-b
option
if specified, or
the input file name without a suffix;
-l
option if specified, or
the defaulted HTML lex file name ("%b-
l.html");
-y
option if specified, or
the defaulted HTML yacc file name ("%b-
y.html").
-T%l#token-%s
.
%union
definition at the start of the grammar,
if such a definition exists. Normally, token type
references remain unlinked.
%b-y.html
.
lex(1), sed(1), yacc(1), flex, GNU bison