-4

i have some corrupt rtf files with lines like this:

{\s39\li0\fi0\ri0\sb0\sa0\ql\vertalt\fs22 Fußzeile Zchn;}
                                          ^----------------------------^

i want to replace all [^a-zA-Z0-9_\{}; ] but only in lines beginning with "{\s" and ending with "};" from the first "space" to "};"

the first "space" and "};" should not be replaced.

gsxr1300
  • 351
  • 2
  • 4
  • 12

3 Answers3

1

You didn't specify language, here is Regex101 example:

({\\s.+?\s)(.*)(})
zipa
  • 27,316
  • 6
  • 40
  • 58
1

So, I'm unsure what language/technology you'd like to use here, but if using C# is an option, you can check out this previous question. The answer gets you almost the way there.

For your example:

var text = @"{\s39\li0\fi0\ri0\sb0\sa0\ql\vertalt\fs22 Fußzeile Zchn;}";
var pattern = @"^({\\s\S*\s[a-zA-Z0-9_\{}; ]*)([^a-zA-Z0-9_\{}; ]*)([^}]*})";
var replaced = System.Text.RegularExpressions.Regex.Replace(text, pattern, "$1$3");

This will get you to replace one contiguous blob of bad characters, which addresses your example, but unfortunately, not your question. There is probably a more elegant solution, but I think you'll have to iteratively run that expression until the input and output of Regex.Replace() are equal.

0

If you can use sed in a terminal, you could do something like this.

sed -i 's/^\({\\s[^ ]*\s\).*\(\;}\)\(}\)\?$/\1\2/' filename

Turned my file containing:

{\s39\li0\fi0\ri0\sb0\sa0\ql\vertalt\fs22 Fußzeile Zchn;}

To:

{\s39\li0\fi0\ri0\sb0\sa0\ql\vertalt\fs22 ;}
Thomas
  • 442
  • 4
  • 11
  • i try it with gnuwin sed and it works fine except there is one line wich end with ;}} instead of ;} should i use a second script with \;}}\ or is it possible to do something like (\;}}\ or \;}) in one script? – gsxr1300 Jun 08 '18 at 17:43
  • thanks, this work too - sed -i "s/^\({\\s[^ ]*\s\).*\(\;}\|\;}}\)$/\1\2/" filename - – gsxr1300 Jun 08 '18 at 18:13
  • If you want to run it twice, yes. Mine was set up to handle both cases so you only have to run it once. Both ways should work fine though. – Thomas Jun 08 '18 at 18:14