Any idea how to take DVI files and turn them into tex?
-
1something similar was today: http://stackoverflow.com/questions/1620002/pdf-to-latex-linux – P Shved Oct 25 '09 at 19:53
-
There is a related question about this in TeX.stackexhange: http://tex.stackexchange.com/q/46779/10944 – madth3 Jan 31 '13 at 18:19
7 Answers
This is similar to the problem of turning PDF into XML which is referred to as "trying to turn a hamburger back into a cow". Both TeX->DVI and XML->PDF lose information, both in the structure of the document and its semantics.
It requires a great deal of heuristics and a large corpus to recreate (some of) the original document. It is never usually 100%. The text strings may be possible, the vectors are harder. Bitmaps are almost impossible.

- 37,407
- 44
- 153
- 217
-
2
-
@Boldewyn I got it from Mike Kay (Saxon) but he got it from somewhere else I thinnk – peter.murray.rust Oct 26 '09 at 13:53
What you are asking is not possible. I think that (same as in PostScript) even recognizing words in a DVI files may require heuristics. A DVI file is a description of where to place individual letters on a piece of paper, and nothing more.
You can get partway there by either dvi2tty
, or by running dvips
followed by ps2ascii
, whichever gives the best results.

- 79,187
- 7
- 161
- 281
I am pretty sure this is not possible. DVI contains informations about rendering the page and not which tex commands it has.

- 1,807
- 2
- 18
- 26
-
I think there should be no doubt that this is possible. The issue is can it be done well enough to be worthwhile. – Charles Stewart Dec 07 '09 at 13:49
for whom ever finds this question again, or for all you who answered I found the best answer for me: what I was looking for is how indeed difficult, it's trying to figure out what could be an original tex that would compile to a given DVI (or pdf for that matter since i can turn the DVI into pdf easily). and InftyReader does it. it works prefect, i tried i a bunch of pdfs on it and then re-made them into pdfs and it was perfect!

- 71
- 1
- 3
-
Yes, good call! OCR systems tend not be smart about linebreaks though: have you looked at how it handles multi-line equations. – Charles Stewart Dec 07 '09 at 14:06
Read Description of the DVI file format and write the programm. Result of your program will not be original text but it will be suitable.

- 26,407
- 13
- 68
- 88
Err, well, sort of.
The path of least resistance will involve, I think, a dvi->rtf convertor. I've posted a question: Q#1859373 dvi2rtf: who can convert DVI files to RTF. And there I post an untested implementation, which gives a bad solution that throws away all formatting.
With such a thing, then you could use word2007/8 and the excellent docx2tex utility to turn the rtf to tex.
The results would be unpleasant to read, but I can see some use cases for doing such.

- 1
- 1

- 11,661
- 4
- 46
- 85