0

I wanted to be able to diff docx file and found that using the following code we are able to extract the text from docx files. unzip -p some.docx word/document.xml | sed -e 's/<[^>]\{1,\}>//g; s/[^[:print:]]\{1,\}//g' | fold -w 80

However, I am struggling to include this into the gitattribute file. Can someone comment on how this line needs to be modified so that git uses the current file instead of the hardlink to the docx file

I have tried the following in git config but it causes an error:

[diff "word"] textconv= unzip -p $LOCAL | sed -e 's/<[^>]\{1,\}>//g; s/[^[:print:]]\{1,\}//g' |

Michael
  • 10,124
  • 1
  • 34
  • 49
Thinker
  • 303
  • 4
  • 13
  • What is the error you receive? – Michael Sep 23 '13 at 11:48
  • sed seems like an awful hack for parsing docx's XML. I would look into getting a more robust utility for converting docx to plaintex. See http://stackoverflow.com/questions/5671988/how-to-extract-just-plain-text-from-doc-docx-files-unix – Max Sep 23 '13 at 15:54
  • Incorrect git configuration file. I suspect its the syntax of the command line code. – Thinker just now edit – Thinker Sep 24 '13 at 12:04

1 Answers1

1

Here's a proper solution for that, simply stripping the strings, that might cause you some headache.

rlegendi
  • 10,466
  • 3
  • 38
  • 50