Problem
I am trying to apply the solution seen in this SO answer. I have also tried to get this solution to the same question to work, but I've been unsuccessful in both cases. Both of these use clean/smudge attribute filters.
The goal is to improve Git's ability to process the files it is to be handling (LaTeX).
What I've done
The short answer "what have you done" is "what the solutions said to do," however in case I've over looked something I'll go into detail.
One potential problem I've looked into is the possiblity that the answers are outdated. From looking at Git Attributes Documentation the only thing I can think that might be out-dated is the location for the configuration file. For this reason I have both a .gitattributes
file in the repo's root as well as a .git/info/attributes
file. I have also tried with only one of these.
Word-by-word
While not the solution I hope to get working, I figured I should try more than one in hopes of better identifying what is going wrong. I choose to go over this one, since I can at least get the script to work outside of Git.
In my config file I have
[filter "wordbyword"]
clean = /home/nero/myScripts/wordbyword.clean
smudge = /home/nero/myScripts/wordbyword.smudge
I copy and paste the above locations into terminal with vim
in front (so you know that I didn't make a typo there) and paste the contents below. First clean
then smudge
though for the latter to be of use, the former must work. . .which is the problem.
#!/usr/bin/perl
use strict;
use warnings;
while (<>)
{
print "$_\n" foreach (m/(.*?\s+)/go);
print '#@#DELIM#@#' . "\n";
}
and
#!/usr/bin/perl
use strict;
use warnings;
while (<>)
{
chomp; '#@#DELIM#@#' eq $_ and print "\n" or print;
}
The attributes file is simply *.tex filter=wordbyword
However when I run git show HEAD:file.tex
it shows that it is being stored normally.
I know that the script works. When I run perl wordbyword.clean test.tex
the output is as expected.
After poking around, I saw that I in fact DID have two files telling Git what needs to be done with .tex files. I had put one in a global location. Oops. This one works now. . .and the next one as well. At least for the cleaning. I'm going to check out smudging now before I answer my own question.
Sentence by Sentance.
This is the one I prefer. To me it seems most reasonable to store a file in logical units. A paragraph is a logical unit, but its too large to be handled effective. A sentence is the next size of logical unit, and is about right.
This is actually the simpler one, since instead of a script, it is a simple one-line perl substitution.
[filter "sentencebreak"]
clean = perl -pe \"s/[.]*?(\\?|\\!|\\.|'') /$&%NL%\\n/g unless m/%/||m/^[\\ *\\\\\\]/\"
smudge = perl -pe \"s/%NL%\\n//gm\"
with the attributes *.tex filter=sentencebreak
However, when I run the substitution with perl -pe "that long line" < test.tex
instead of printing newlines at every period, it prints \n
at every white space (and leaves periods alone). I identified \\n
which appears to be escaping the newline. Changing that to \n
causes it to produce newlines, however it is still breaking at white space, which is not what I desire.
Looking closer at that perl substitution (I'm bad at perl) I see that its escaping the punctuation. Removing the extra \
"mixed" that part of it.