Open main menu
Home
Random
Donate
Recent changes
Special pages
Community portal
Preferences
About Stockhub
Disclaimers
Search
User menu
Talk
Contributions
Create account
Log in
Editing
Module:LaTeX2UTF8/doc
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Contex == The mathematical typesetting systems [[TeX]] and [[LaTeX]] are in near-universal use in mathematics as well as several scientific disciplines. Sites as diverse as [[Google Scholar]], [[Mathematical Reviews]] and [[Zentralblatt MATH]] allow citations to be exported in [[BibTeX]], a bibliographical system that uses TeX/LaTeX markup. Unfortunately, this set of software was written well before [[Internationalization and localization|internationalization and locationization]] efforts such as [[Unicode]] and [[UTF-8]] reached critical mass. As a result, there is a fundamental impedance mismatch between how the two systems handle typography. LaTeX (by while I'll refer to the three systems as a whole) is primarily concerned with character composition: if the user wishes to add an [[Diaeresis (diacritic)|umlaut]] to the number 7, LaTeX provides that capability (and indeed, there are multiple ways of coding up such a construct in the language). Unicode, on the other hand, focuses on mapping [[glyph]]s to unique numerical identifiers. As such, Unicode tends to capture only extant glyphs. Mapping LaTeX to Unicode (or UTF-8) thus has two difficulties: LaTeX has an effectively infinite number of ways of representing the same symbol (although only handful are in common use), and only a subset of possible symbols will be found in the set covered by Unicode. The second problem tends to be rare and we will leave it for another day. This module addresses the first problem. Consider the following (legal) LaTeX code: <pre> \'o {\'o} {\' o} {\'{o}} </pre> If this were compiled, the result would look like: ó ó ó ó This module handles all four cases as well as a few LaTeX escape codes. Attempting to capture even a significant subset of the Unicode glyph set would be a herculean task, and nearly all of that effort might never be used. Instead, I have seeded this module with the glyphs that I expect to use in my own editing and hope that others add to it as its shortcomings become obvious. One final note: a properly cynical reader might have noticed that in the last example above there are two adjacent closing braces. If a user attempts to pass this string from article space using #invoke, the parser will assume the LaTeX "}}" closes the template, even if it is embedded in a string. The best solution I have so far is to keep LaTeX-encoded data in Module-space and have article-space calls request the translated strings. Thus, this module is not designed to work reliably via #invoke.
Summary:
Please note that all contributions to Stockhub may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
Stockhub:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)