tex2word: Convert LaTeX to Word XML

tex2word converts LaTeX source documents into Office Open XML format introduced in Microsoft Word 2007 without any intermediary steps. Emphasis is put on equations, which are traditionally typeset in LaTeX but occasionally have to be transferred to Word to meet requirements set by publishers. tex2word saves the tedious work of manually converting a TeX document into Word, or fixing small (typographic) errors usually introduced by various conversion methods.

tex2word is free software; free both as in the sense of free speech and free beer.

Requirements

For the LaTeX parser and the Microsoft Word Open XML code generator to work, you will need SWI-Prolog, which is available for multiple platforms. A Python (version 3.1) script is included to extract files from the skeleton document, invoke the LaTeX parser and Word code generator, and repackage the document with the updated file included.

Package contents

tex2word consist of the following components:

  • tex2word.bat is a Windows batch file that helps discover Prolog and Python interpreters and execute the main application. tex2word.bat acts as an entry point, which you should normally use.
  • tex2word.pl is a Prolog module that parses LaTeX documents (e. g. sample.tex) and generates Word Open XML documents (document.xml).
  • tex2word.py is a Python script that extracts the skeleton document (skeleton.docx), runs the Prolog parser and code generator, and re-packages generated files into a Word document (e. g. sample.docx)
  • skeleton.docx is a minimalistic skeleton document that contains files, application data, style declarations, etc. that comprise a Word document. skeleton.docx is actually a zip file of a number of XML files arranged in a directory structure.
  • sample.tex is a sample file to illustrate the capabilities of tex2word.

Features

  • paragraphs, sections, subsections, subsubsections, tables (tabular), etc.
  • text styling: plain, bold (textbf), italics (emph and textit), blackboard (mathbb), calligraphic (mathcal), fraktur (mathfrak), etc.
  • cross-references (label and ref), bibliographic entries (cite)
  • fractions (frac)
  • subscripts and superscripts
  • square root and radical with degree (sqrt)
  • integrals (int), double integrals (iint), triple integrals (iiint), contour integral (oint), surface integral (oiint)
  • summation (sum), product (prod), co-product (coprod), union (bigcup), intersection (bigcap)
  • grouping: parentheses, brackets, curly braces, absolute value, norm, floor, ceiling
  • functions: sin, cos, tan, min, max, arg, etc.
  • mathematical accents: dot, double dot, hat, tilde, bar, double overbar, arrow above, harpoon above
  • matrices (array)
  • mathematical symbols: Greek letters, Hebrew letters, binary relationships, set operators, binary operators, etc.

Usage

You should convert your .tex file to UTF-8 encoding before feeding it to tex2word. Without any arguments, tex2word.py will convert sample.tex into sample.docx.

Disclaimer

tex2word is in an early stage of development. Development has focused on converting tex files generated by the LyX document processing system into Word Open XML files. While it should be able to process general tex input, automatically generated code produced by LyX is more restrictive than arbitrary tex code compiled manually, and tex2word might run into problems. tex2word has been subject to little testing as far as general tex input is concerned. If you run into problems converting a document, you might submit it to the This email address is being protected from spambots. You need JavaScript enabled to view it..

Download

tex2word is available for download under GNU/GPL from its project archive.