3

I have a bit of a strange problem: I have a code (it's LaTeX but that does not matter here) that contains long lines with period (sentences). For better version control I wanted to split these sentences on a new line each. This can be achieved via sed 's/\. /.\n/g'.

Now the problem arises if there are comments with potential periods as well. These comments must not be altered, otherwise they will be parsed as LaTeX code and this might result in errors etc.

As a pseudo example you can use

Foo. Bar. Baz. % A. comment. with periods.

The result should be

Foo.
Bar.
Baz. % ...

Alternatively the comment might go on the next line without any problems.

It was ok to use perl if that would work out better. I tried with different programs (sed and perl) a few ideas but none did what I expected. Either the comment was also altered or only the first period was altered (perl -pe 's/^([^%]*?)\. /\1.\n/g').

Can you point me in the right direction?

Christian Wolf
  • 1,187
  • 1
  • 12
  • 33
  • Is it fair to assume that your lines go further than after the comment, like `Foo. Bar. Baz. % A. comment. with periods. Syn. Ack. % Another. comment`, or do the lines end after a comment? – stevieb Nov 20 '15 at 14:26
  • Unfortunately the comments end at the end of the line (without any termination symbol). – Christian Wolf Nov 20 '15 at 14:38

2 Answers2

4

This is tricky as you're essentially trying to match all occurrences of ". " that don't follow a "%". A negative look-behind would be useful here, but Perl doesn't support variable-width negative look-behind. (Though there are hideous ways of faking it in certain situations.) We can get by without it here using backtracking control verbs:

s/(?:%(*COMMIT)(*FAIL))|\.\K (?!%)/\n/g;

The (?:%(*COMMIT)(*FAIL)) forces replacement to stop the first time it sees a "%" by committing to a match and then unconditionally failing, which prevents back-tracking. The "real" match follows the alternation: \.\K (?!%) looks for a space that follows a period and isn't followed by a "%". The \K causes the period to not be included in the match so we don't have to include it in the replacement. We only match and replace the space.

Community
  • 1
  • 1
Michael Carman
  • 30,628
  • 10
  • 74
  • 122
1

Putting the comment by itself on a following line can be done with sed pretty easily, using the hold space:

sed '/^[^.]*%/b;/%/!{s/\. /.\n/g;b};h;s/[^%]*%/%/;x;s/ *%.*//;s/\. /.\n/g;G'

Or if you want the comment by itself before the rest:

sed '/^[^.]*%/b;/%/!{s/\. /.\n/g;b};h;s/ *%.*//;s/\. /.\n/g;x;s/[^%]*%/%/;G'

Or finally, it is possible to combine the comment with the last line also:

sed '/^[^.]*%/b;/%/!{s/\. /.\n/g;b};h;s/[^%]*%/%/;x;s/ *%.*//;s/\. /.\n/g;G;s/\n\([^\n]*\)$/ \1/'
Jeff Y
  • 2,437
  • 1
  • 11
  • 18
  • This only works if every line has a comment. This is not true in general. Then the first `s///` match does not fit resulting in double lines. – Christian Wolf Nov 23 '15 at 12:15
  • You're right. Also, comments with no preceding text were getting extra newlines inserted before them. Two extra conditions need prefixed to handle those cases. Answer updated accordingly. – Jeff Y Nov 23 '15 at 15:51