I'd like to create a script with any combination of bash, sed, awk, or perl that deletes the newline character of a line if the next line is less than a certain length. Let's say we want to delete the newline character if the next line is less than 5 characters. If we have this source text file:
hi hi hi hi hi
bye
fun fun fun fun fun
batman
shirt shirt shirt
pants pants pants
belt
paper paper paper
Here's the desired output:
hi hi hi hi hibye
fun fun fun fun fun
batman
shirt shirt shirt
pants pants pantsbelt
paper paper paper
Here's a script that identifies all the lines that are less than 5 characters:
cat source.txt | awk 'length($0) < 5 { print NR }'
It returns this.
2
7
Here's a script that gets rid of the newlines (it's the line numbers from the previous script minus one):
perl -pe 'chomp if $.==1||$.==6' source.txt
How do I combine these two scripts? Or is there a better way to solve this?
Update
There were multiple correct answers (some didn't work on my Mac, but I think they'd work on other machines). Here's how long the correct answers took on my machine with a 769,811 line CSV file (40,000 lines had the newline character removed).
- Ed Morton's awk solution: 23.7 seconds
- wolfrevokcats perl with slurp: 4.5 seconds
- John1024's solution didn't work on my Mac (but think it works on other OSs)
- ikegami's perl without slurp: Killed the task after 7 minutes