tex2word: Convert LaTeX to Word XML
tex2word converts LaTeX source documents into Office Open XML format introduced in Microsoft Word 2007 without any intermediary steps. Emphasis is put on equations, which are traditionally typeset in LaTeX but occasionally have to be transferred to Word to meet requirements set by publishers. tex2word saves the tedious work of manually converting a TeX document into Word, or fixing small (typographic) errors usually introduced by various conversion methods.
tex2word is free software; free both as in the sense of free speech and free beer.
For the LaTeX parser and the Microsoft Word Open XML code generator to work, you will need SWI-Prolog, which is available for multiple platforms. A Python (version 3.1) script is included to extract files from the skeleton document, invoke the LaTeX parser and Word code generator, and repackage the document with the updated file included.
tex2word consist of the following components:
- tex2word.bat is a Windows batch file that helps discover Prolog and Python interpreters and execute the main application. tex2word.bat acts as an entry point, which you should normally use.
- tex2word.pl is a Prolog module that parses LaTeX documents (e. g. sample.tex) and generates Word Open XML documents (document.xml).
- tex2word.py is a Python script that extracts the skeleton document (skeleton.docx), runs the Prolog parser and code generator, and re-packages generated files into a Word document (e. g. sample.docx)
- skeleton.docx is a minimalistic skeleton document that contains files, application data, style declarations, etc. that comprise a Word document. skeleton.docx is actually a zip file of a number of XML files arranged in a directory structure.
- sample.tex is a sample file to illustrate the capabilities of tex2word.
- paragraphs, sections, subsections, subsubsections, tables (tabular), etc.
- text styling: plain, bold (textbf), italics (emph and textit), blackboard (mathbb), calligraphic (mathcal), fraktur (mathfrak), etc.
- cross-references (label and ref), bibliographic entries (cite)
- fractions (frac)
- subscripts and superscripts
- square root and radical with degree (sqrt)
- integrals (int), double integrals (iint), triple integrals (iiint), contour integral (oint), surface integral (oiint)
- summation (sum), product (prod), co-product (coprod), union (bigcup), intersection (bigcap)
- grouping: parentheses, brackets, curly braces, absolute value, norm, floor, ceiling
- functions: sin, cos, tan, min, max, arg, etc.
- mathematical accents: dot, double dot, hat, tilde, bar, double overbar, arrow above, harpoon above
- matrices (array)
- mathematical symbols: Greek letters, Hebrew letters, binary relationships, set operators, binary operators, etc.
You should convert your .tex file to UTF-8 encoding before feeding it to tex2word. Without any arguments, tex2word.py will convert sample.tex into sample.docx.