0

I am working on laTex files, I need to delete everything between two $ including newlines and keep only english text.

I am using a command like this to process the files:

find "." -name "*.tex" | xargs perl -pi -e 's/\$[^\$].*?\$/ /g' *

Example:

Then use the naturality formula 

    $t_{G^{n-1}M} G^{i+1} (\epsilon_{G^{n-i}M}) 
    = G^{i+1} (\epsilon_{G^{n-i}M}) t_{G^n M}$ on the left-hand side.

OutPut:

Then use the naturality formula 
 on the left-hand side.

another example from file:

EXAMPLE:

\begin{itemize}
\item $M$ is atomic and finitely generated;
\item $M$ is cancellative;
\item $(M, \le_L)$ and $(M, \le_R)$ are lattices;
\item there exists an element $\Delta \in M$, called {\it Garside element}, such that the set 
$L(\Delta)= \{ x \in M; x\le_L \Delta\}$ generates $M$ and is equal to $R(\Delta)= \{ x\in M; 
x\le_R \Delta\}$.
\end{itemize}

OUTPUT:

\begin{itemize}
\item   is atomic and finitely generated;
\item   is cancellative;
\item   and   are lattices;
\item there exists an element  , called {\it Garside element}, such that the set 
  generates   and is equal to $R(\Delta)= \{ x\in M; 
x\le_R \Delta\}$.
\end{itemize} 

if you can notice ( $R(\Delta)= { x\in M; x\le_R \Delta}$.) can not be removed!!

Example 2 from different file, the input same as the output nothing has changed :

    Using the fact that   is atomic and that $L(\Delta)= 
\{x \in M; x \le_L \Delta\} M \pi_L(a) \neq 1 a \neq 
1 k \partial_L^k(a)=1 k$ be the
Will Barnwell
  • 4,049
  • 21
  • 34
Zain
  • 53
  • 5
  • *"it works but not for all cases that I have in the files"* Please give some examples of the cases that aren't working, and a brief example document that you want to process. It's not fair to ask us to write software without seeing any input. – Borodin Oct 06 '17 at 17:54
  • I edited back in your regex/command as that i relevant information to the question. I have also updated my answer – Will Barnwell Oct 06 '17 at 22:33

1 Answers1

0

I'm guessing this is not matching when the text it should match spans multiple lines.

You have [^\$].*? which matches one character that is not $ using [^\$] and then matches .*? which matches any character that is not a newline character zero or more times lazily. This works on your single line cases because of the lazy modifier trying to match a $ before another ., but fails the multi-line cases because . will not match a newline.

Correct and more efficient would be [^\$]* which would match as many non-$ characters as possible, including newlines.

So your command would would be

s/\$[^\$]*\$/ /g

or cleaner looking in my opinion to use non-standard delimiters and avoid the 'fencepost' look /\

s~\$[^\$]*\$~ ~g

Demo

Perl is processing your file line by line, which is another cause for the failing matches across newlines. There are a number of documented answers to this problem already on SO and written by people who know perl better than I: How to match multiline data in perl

Will Barnwell
  • 4,049
  • 21
  • 34
  • unfortunately it did not work! for example this case has not been solved , ..projection $I_{d,i} \rightarrow M_{d,i} M_{d,0}$, where all the .. – Zain Oct 05 '17 at 23:58
  • 1
    Could you update your question with multiple examples of cases that are not working? – Will Barnwell Oct 06 '17 at 00:40
  • Why do you think it's "cleaner" to use non-standard delimiters for your substitution, especially the pipe character `|` which is a regex metacharacter? And why are you using the non-greedy quantifier when you have said that the greedy version is "more efficient"? – Borodin Oct 06 '17 at 18:00
  • it looks nicer, notice i said 'imo' and gave it after the answer, thanks for the catch on the `?` it is more efficient to use the greedy, nearly cuts match steps in half – Will Barnwell Oct 06 '17 at 21:27