After the question update, the requirements for the solution changed:
cat test.txt | tr '\n' ' ' | perl -ne 's/(?<!\|) ([A-Z])/\n\1/g; print' | sed 's/ ,/,/g' | sed 's/ \([0-9]\+\)/\n\1/g'; echo
output:
1. good movie (2006)
This is a world class movie for music.
Dir: abc
With: lan, cer, cro
Comedy | Drama | Family | Musical | Romance
120 mins.
Explanation:
- First I replace all newline characters using
tr
.
- Second I replace every capital letter by a preceding newline and
itself unless it is preceeded by a pipe "| "symbol.
- The third one corrects the comma spacings.
- The last moves the duration declaration to a new line
The echo
at the very end is to append a 'newline' to the output.
Deprecated:
Building on kpie's comment, I suggest you the following solution:
cat test.txt | sed ':a;N;$!ba;s/\n//g' | sed 's/\([A-Z]\)/\n\1/g'
I pasted your input into test.txt.
The first sed
replacement is explained here: https://stackoverflow.com/a/1252191/1863086
The second one replaces every captial letter by a preceding newline and itself.
EDIT:
Another possibility using tr
:
cat test.txt | tr -d '\n' | sed 's/\([A-Z]\)/\n\1/g'; echo