Hyphenation exceptions

While TeX's hyphenation rules are good, they're not infallible: you will occasionally find words TeX just gets wrong. So for example, TeX's default hyphenation rules (for American English) don't know the word “manuscript”, and since it's a long word you may find you need to hyphenate it. You can “write the hyphenation out” each time you use the word:

... man\-u\-script ...

Here, each of the \- commands is converted to a hyphenated break, if (and only if) necessary.

That technique can rapidly become tedious: you'll probably only accept it if there are no more than one or two wrongly-hyphenated words in your document. The alternative is to set up hyphenations in the document preamble. To do that, for the hyphenation above, you would write:

\hyphenation{man-u-script}

and the hyphenation would be set for the whole document. Barbara Beeton publishes articles containing lists of these “hyphenation exceptions”, in TUGboat; the hyphenation “man-u-script” comes from one of those articles.

What if you have more than one language in your document? Simple: select the appropriate language, and do the same as above:

\usepackage[french]{babel}
\selectlanguage{french}
\hyphenation{re-cher-cher}

(nothing clever here: this is the “correct” hyphenation of the word, in the current tables). However, there's a problem here: just as words with accent macros in them won't break, so an \hyphenation commands with accent macros in its argument will produce an error:

\usepackage[french]{babel}
\selectlanguage{french}
\hyphenation{r\'e-f\'e-rence}

tells us that the hyphenation is “improper”, and that it will be “flushed”. But, just as hyphenation of words is enabled by selecting an 8-bit font encoding, so \hyphenation commands are rendered proper again by selecting that same 8-bit font encoding. For the hyphenation patterns provided for “legacy”, the encoding is Cork, so the complete sequence is:

\usepackage[T1]{fontenc}
\usepackage[french]{babel}
\selectlanguage{french}
\hyphenation{r\'e-f\'e-rence}

The same sort of performance goes for any language for which 8-bit fonts and corresponding hyphenation patterns are available. Since you have to select both the language and the font encoding to have your document typeset correctly, it should not be a great imposition to do the selections before setting up hyphenation exceptions.

Modern TeX variants (principally XeTeX and LuaTeX) use unicode, internally, and distributions that offer them also offer UTF-8-encoded patterns; since the hyphenation team do all the work “behind the scenes”, the use of Unicode hyphenation is deceptively similar to what we are used to.


Source: Hyphenation exceptions

composition/langues/cesure/nouveaux_motifs_de_cesure.txt · Dernière modification: 2018/12/04 00:20 par jejust
CC Attribution-Share Alike 4.0 International
Driven by DokuWiki Recent changes RSS feed Valid CSS Valid XHTML 1.0