Remove line breaks using linux command

Question

My database log file looks like this...

vi test.txt

'2020-03-27T08:00:24Z UTC [ db=xdb user=root pid=9037 userid=100 xid=36076898 ]' LOG: SELECT nspname FROM pg_namespace ORDER BY nspname
'2020-03-27T08:00:24Z UTC [ db=xdb user=root pid=9037 userid=100 xid=36076899 ]' LOG: SET search_path TO "public"
'2020-03-27T08:00:24Z UTC [ db=xdb user=root pid=9037 userid=100 xid=36076900 ]' LOG: SELECT typname
FROM pg_type
WHERE typnamespace = (SELECT oid FROM pg_namespace WHERE nspname = current_schema())
'2020-03-27T08:00:24Z UTC [ db=xdb user=root pid=9037 userid=100 xid=36076897 ]' LOG: SELECT datname FROM pg_database ORDER BY datname

Because of line breaks like '\n' and '\r' I am not able to check the complete query. For e.g.

# grep '2020' test.txt
'2020-03-27T08:00:24Z UTC [ db=xdb user=root pid=9037 userid=100 xid=36076898 ]' LOG: SELECT nspname FROM pg_namespace ORDER BY nspname
'2020-03-27T08:00:24Z UTC [ db=xdb user=root pid=9037 userid=100 xid=36076899 ]' LOG: SET search_path TO "public"
'2020-03-27T08:00:24Z UTC [ db=xdb user=root pid=9037 userid=100 xid=36076900 ]' LOG: SELECT typname
'2020-03-27T08:00:24Z UTC [ db=xdb user=root pid=9037 userid=100 xid=36076897 ]' LOG: SELECT datname FROM pg_database ORDER BY datname

As you can see, the line "FROM pg_type" is missing in the above output. How do I remove line breaks in this text file? I will need to keep line break before '2020' since that is another query.

How do I write a regular expression that will remove all breaks between "LOG:" and "'2020-"

Do you want to remove all carriage returns in the file? See [Remove carriage return in Unix](https://stackoverflow.com/questions/800030/remove-carriage-return-in-unix) (e.g. `sed -i 's/\r//g' file`) — Wiktor Stribiżew, Mar 27 '20 at 11:25

Zorzi · Answer 1 · 2020-03-30T15:39:03.277

2

A bit of a dirty solution, but you could do something like:

cat my_log_file.log | tr '\n' ' ' | sed "s/\('[0-9]\{4\}\)/\r\n\1/g"

# OR, simpler version:

tr '\n' ' ' < my_log_file.log | sed "s/\('[0-9]\{4\}\)/\r\n\1/g"

basically, you delete all '\n', and then you add them again where they should be

edited Mar 30 '20 at 15:39

answered Mar 27 '20 at 11:39

Zorzi

718
4
9

1

`cat my_log_file.log | tr '\n' ' '` = `tr '\n' ' ' < my_log_file.log`. See http://porkmail.org/era/unix/award.html. – Ed Morton Mar 27 '20 at 13:28

William Pursell · Answer 2 · 2020-03-27T14:05:54.130

1

awk 'match($0, r) && NR>1 {print ""} 
    {printf "%s", $0} END {print ""}
    ' r="^'2020" test.txt

edited Mar 27 '20 at 14:05

answered Mar 27 '20 at 12:15

William Pursell

204,365
48
270
300

potong · Answer 3 · 2020-03-28T10:58:07.403

1

This might work for you (GNU sed):

sed '/^'\''2020/{:a;N;/^\('\''2020\).*\n\1/!s/\n/ /;ta;P;D}' file

If a line begins '2020, append the next line and if that line does not begin '2020, replace the newline between the lines with a space, append the next line and repeat. Otherwise print/delete the first line and repeat.

The OP has expressed How do I write a regular expression that will remove all breaks between "LOG:" and "'2020-".To handle any year, use:

sed '/^'\''[1-9][0-9][0-9][0-9]/{:a;N;/^'\''[1-9][0-9][0-9][0-9].*\n'\''[1-9][0-9][0-9][0-9]/!s/\n/ /;ta;P;D}' file

edited Mar 28 '20 at 10:58

answered Mar 27 '20 at 13:21

potong

55,640
6
51
83

how do you handle 2019? – kvantour Mar 27 '20 at 14:27

Ed Morton · Answer 4 · 2020-03-27T13:36:02.007

1

$ awk '{printf "%s%s", (/^\047/ ? ors : ofs), $0; ors=ORS; ofs=OFS} END{printf "%s", ors}' file
'2020-03-27T08:00:24Z UTC [ db=xdb user=root pid=9037 userid=100 xid=36076898 ]' LOG: SELECT nspname FROM pg_namespace ORDER BY nspname
'2020-03-27T08:00:24Z UTC [ db=xdb user=root pid=9037 userid=100 xid=36076899 ]' LOG: SET search_path TO "public"
'2020-03-27T08:00:24Z UTC [ db=xdb user=root pid=9037 userid=100 xid=36076900 ]' LOG: SELECT typname FROM pg_type WHERE typnamespace = (SELECT oid FROM pg_namespace WHERE nspname = current_schema())
'2020-03-27T08:00:24Z UTC [ db=xdb user=root pid=9037 userid=100 xid=36076897 ]' LOG: SELECT datname FROM pg_database ORDER BY datname

edited Mar 27 '20 at 13:36

answered Mar 27 '20 at 13:30

Ed Morton

188,023
17
78
185

While this is the correct method, the might be recommended to perform the check (`/^\047/`) more extensive by checking if it is followed by a date-format. However, this is upto the OP the decide if this might be needed. – kvantour Mar 27 '20 at 14:15
@kvantour true. Not sure if there is a definitive way to tell the start of a record from text that might be at the start of a line due to a string being split mid-record but I'd guess `/^\047[0-9]{4}(-[0-9]{2}){2}T[0-9]{2}(:[0-9]{2}){2}Z UTC \[[^]]+]\047/` would probably be robust enough. – Ed Morton Mar 27 '20 at 14:26

Remove line breaks using linux command

4 Answers4