This essay is intended to turn a numb and vague pain of ``this looks bad'' into a sharp and stingy pain of ``this should be a hyphen,'' or ``there should be a little more white space here.''
I have no formal typographical education. I have a year's worth of experience with attempting to mark up documents without making them worse than the ASCII copy I had received. I think I'm almost there, now, but it was tricky, and I often find myself forced to specify layout where I would much rather specify structure.
None of the following points apply to or criticize SGML, the much maligned general markup language that HTML instantiates.
Much of this spacing can be inferred (and has been inferred by typesetting systems such as troff and TeX for years). But if this inference takes place, there needs to be a way of marking exceptions; e.g., places where a dot followed by white space and an upper-case letter does not constitute a full stop.
HTML has no support for sentences; they don't exist as an object or a concept in the document. If, at this late point, WWW client programs would begin to infer sentence boundaries from white space, the HTML text will not contain the necessary exceptions and will still look wrong occasionally.
Unless the author explicitly requests French spacing, I
used to emulate proper spacing using sequences of an ISO 8859-1
non-breaking space ( ) that I heard about on USENET
a few years ago, and a typewritten (and hence, accidentally,
wider) space (<tt> </tt>).
They happen to print as two spaces in character-based browsers
and when cutting-and-pasting text from my favorite
bitmapped client, and they happen to be almost the right
size when printed on a bit-mapped terminal.
A normal double-space, achievable as ``   '' on my
bit-mapped client, would have been even closer, but
prints as three spaces on the character-based
frontend. Using inlined images with an
alt=" "
text of two spaces is out of the question, since cutting
and pasting the displayed text from a bitmapped browser will
capture neither the alt text nor (of course) the
intended white space.
The HTML+ specification included an   entity reference that would have remedied the problem. There is no   in the HTML 2.0 specification that supposedly grew out of the HTML+ specification.
The HTML+ specification included an entity reference to a non-breaking space. The reference was never implemented in the common `Mosaic' browser, and consequently rarely occurs outside of ``lists of entity references in HTML.'' does occur frequently in the Arena reference documents on HTML 3.0.
The HTML+ specification included — and – entity references that would have freed the conventional hyphen to be rendered as a ``real'' hyphen.. There is no — or – in the HTML 2.0 specification that supposedly grew out of the HTML+ specification.