1

I have the following segment of Markdown with embedded LaTeX equations:

# Fisher's linear discriminant

\newcommand{\cov}{\mathrm{cov}}
\newcommand{\A}{\mathrm{A}}
\renewcommand{\B}{\mathrm{B}}
\renewcommand{\T}{^\top}

The first method to find an optimal linear discriminant was proposed by Fisher
(1936), using the ratio of the between-class variance to the within-class variance
of the projected data, $d(\vec x)$, as a criterion. Expressed in terms of the
sample properties, the $p$-dimensional centroids $\bar {\vec x}_\A$ and
$\bar {\vec x}_\B$ and the $p \times p$ covariance matrices
$S_A = \cov_i ( \vec x_{\A i} )$ and $S_B = \cov_i ( \vec x_{\B i} )$, the
optimal direction is given by 
$$
\vec w = \left ( \frac{ S_A + S_B }{2} \right ) ^{-1}
~ ( \bar {\vec x}_\B - \bar {\vec x}_\A ).
$$

When I convert it with pandoc to LaTeX and compile it with xelatex, I get the expected text with nicely rendered math. When I convert it with pandoc to MS Word using

pandoc test.text -o test.docx

and open it in MS Office Word 2007, I get the following:

word screenshot

Only those parts of the equations that are symbols or upright text get rendered correctly, while variable names in italics are replaced by a question mark in a box.

How can I make this work?

A. Donda
  • 8,381
  • 2
  • 20
  • 49
  • Your input works for me with pandoc 1.12.2 on Mac OS X. Can you post a link to the word file you get? Here's mine: http://www.fileswap.com/dl/wajeArZq4c/ – mb21 Dec 11 '13 at 21:54
  • @mb21 Thanks for replying! Your docx looks identical to mine if I open it in Word. So maybe its a problem with my copy/installation of Word, and not with the file. Btw. I found a workaround: I can switch equation display in Word to "linear" and then back to "professional", and all the symbols appear. – Here's mine: https://dl.dropboxusercontent.com/u/14431931/test.docx – A. Donda Dec 11 '13 at 22:15
  • Oh well, that's what your doc looks like on my copy of Word on Mac: http://share.pho.to/4J6al I guess it might help using the newest version of pandoc... – mb21 Dec 11 '13 at 22:33
  • @mb21 Ah, no that's just having made a mistake just yet; I omitted the last "$$". I've updated the file, please try again. – A. Donda Dec 11 '13 at 22:41
  • Ah, looks just as mine now. Those question marks usually appear when the chosen font doesn't have that character. Do you have the font `Cambria Math` installed? – mb21 Dec 11 '13 at 23:30
  • I checked, yes it is installed. – A. Donda Dec 12 '13 at 00:06
  • Thank you for your help, I think it's quite clear now that it is not a pandoc problem. – A. Donda Dec 12 '13 at 00:07
  • @A. Donda - I was unable to access your .docx at dropbox, but I downloaded mb21's file and looked at the XML. In the settings.xml, the Math font in there is set to Lucida Grande, whereas normally it is Cambria Math (as discussed). That works OK on Mac Word, but when I tried to open it in Windows Word 2010 (which does not have Lucida Grande) I could not even view the text in Print view (it seemed stuck in draft view). I can see that Word is using Cambria Math for the display (nothing is listed in the font substitutions). Perhaps that is a factor. –  Dec 12 '13 at 08:59
  • I added an answer based on the tip by @bibadia – mb21 Dec 12 '13 at 15:42
  • I'm answering here, in order to be able mentioning @bibadia. I recreated my docx (should be accessible via the link again), and looked into it. You are right, the file does reference Lucida Grande instead of Cambria Math. I checked, this setting comes from the "reference.docx" which comes with pandoc. However, changing that setting and generating a new docx does not change anything about my display problem. But still, thanks for your efforts! – A. Donda Dec 12 '13 at 18:43
  • Yes, I discovered my Windows copy of Word was in an unusual state - now that is fixed, both your file and the one I had earlier open fine with all characters displaying in Word 2010. I'll describe what happens in Word 2007 in an Answer - not enough space here. –  Dec 12 '13 at 22:11

3 Answers3

1

In Word 2007, I see a result similar to yours, except that here, I don't see the "question marks in boxes" characters, just space.

If I then take one of the expressions, and use your trick of going to linear display and back, the characters reappear for that expression.

If I save and re-open, the other expressions still do not display correctly, but if I save and look at the XML, I notice that

  1. the Math font has been changed to Cambria Math
  2. additional run parameter (w:rPr) XML specifying the Cambria Math font has been inserted in many of the runs (w:r) inside the oMath elements, even in the oMath expressions that do not display correctly. However, in the oMath expression that now displays correctly, this extra XML has been applied to every run. In the others, it has only been applied to some runs (I think I can see the pattern but I'm running out of time here right now...)
  3. If I manually add the XML to the other runs and re-open the document, the expressions appear correctly. Or at least, they do in the one case I have tried.

Since Word 2010 displays the resuls correctly, I can only assume that it does not rely on these explicit font settings, whereas Word 2007 does. This doesn't really help you yet, because altering all those w:r elements would be even harder than what you are already doing. But it is possible that a default style/font needs to be set, either somewhere higher in the XML hierarchy, or perhaps elsewhere in the .zip (perhaps in fontTable.xml or styles.xml). I'm not familiar enough with Word's XML structures to guess what, if anything might be missing, but may be able to have a look tomorrow.

I suppose another possibility is that you just have to have all these extra rPr elements for this to work in Word 2007, which would suggest that pandoc may have been written for Word 2010, not 2007. (I don't know anything about the tool).

As an example, where you have

<m:r>
  <m:t>(</m:t>
</m:r>

what you need is

<m:r>
  <w:rPr>
    <w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math" />
  </w:rPr>
  <m:t>(</m:t>
</m:r>
  • I still don't see everything clearly yet, especially since the XML generated by Word is hard to read, but you are definitely on the right track. I'll try and see whether changes to the reference.docx that pandoc uses make this go away without the trick. Maybe I'll submit a bug report. If you happen to find out more, please update the answer. In any case: thanks a lot! – A. Donda Dec 13 '13 at 13:55
  • I had an extensive look around, but at the moment I cannot see any other way to avoid all these separate w:rPr settings. I hoped that changing the element dispDef under mathPr in settings.xml might have an effect, but it doesn't. None of the other things I tried (just in case!) such as adding Cambria Math in to the fontTable.xml had any effect. –  Dec 13 '13 at 16:43
1

I did the following to get rid of the font issue:

  1. Create a new empty word document.
  2. Copy all content to the new document.
  3. Choose Match Source Format.
0

As discussed above, Windows doesn't have the font Lucida Grande, so substituting the Math Font with Cambria Math should work.

  1. Rename the test.docx to test.zip
  2. vim test.zip and select test/word/settings.xml
  3. find and change Lucida Grande to Cambria Math
  4. save and rename zip to docx. This results in something like this docx.

You can then also supply that file as a sort of docx template to pandoc with the --reference-docx option.

mb21
  • 34,845
  • 8
  • 116
  • 142
  • This does not work. I have the same problem as OP, but the math font defined in the file generated by pandoc is Cambria Math. – January Apr 29 '18 at 10:19