Editing Module:LaTeX2UTF8/doc (section)

== Contex ==
The mathematical typesetting systems [[TeX]] and [[LaTeX]] are in near-universal use in mathematics as well as several scientific disciplines.  Sites as diverse as [[Google Scholar]], [[Mathematical Reviews]] and [[Zentralblatt MATH]] allow citations to be exported in [[BibTeX]], a bibliographical system that uses TeX/LaTeX markup.  Unfortunately, this set of software was written well before [[Internationalization and localization|internationalization and locationization]] efforts such as [[Unicode]] and [[UTF-8]] reached critical mass.  As a result, there is a fundamental impedance mismatch between how the two systems handle typography.  

LaTeX (by while I'll refer to the three systems as a whole) is primarily concerned with character composition: if the user wishes to add an [[Diaeresis (diacritic)|umlaut]] to the number 7, LaTeX provides that capability (and indeed, there are multiple ways of coding up such a construct in the language).  Unicode, on the other hand, focuses on mapping [[glyph]]s to unique numerical identifiers.  As such, Unicode tends to capture only extant glyphs.

Mapping LaTeX to Unicode (or UTF-8) thus has two difficulties:  LaTeX has an effectively infinite number of ways of representing the same symbol (although only handful are in common use), and only a subset of possible symbols will be found in the set covered by Unicode.  The second problem tends to be rare and we will leave it for another day.  This module addresses the first problem.

Consider the following (legal) LaTeX code:

<pre>
\'o {\'o} {\' o} {\'{o}}
</pre>

If this were compiled, the result would look like:

 ó ó ó ó

This module handles all four cases as well as a few LaTeX escape codes.  Attempting to capture even a significant subset of the Unicode glyph set would be a herculean task, and nearly all of that effort might never be used.  Instead, I have seeded this module with the glyphs that I expect to use in my own editing and hope that others add to it as its shortcomings become obvious.

One final note:  a properly cynical reader might have noticed that in the last example above there are two adjacent closing braces.  If a user attempts to pass this string from article space using #invoke, the parser will assume the LaTeX "}}" closes the template, even if it is embedded in a string.  The best solution I have so far is to keep LaTeX-encoded data in Module-space and have article-space calls request the translated strings.  Thus, this module is not designed to work reliably via #invoke.