1

My database log file looks like this...

vi test.txt

'2020-03-27T08:00:24Z UTC [ db=xdb user=root pid=9037 userid=100 xid=36076898 ]' LOG: SELECT nspname FROM pg_namespace ORDER BY nspname
'2020-03-27T08:00:24Z UTC [ db=xdb user=root pid=9037 userid=100 xid=36076899 ]' LOG: SET search_path TO "public"
'2020-03-27T08:00:24Z UTC [ db=xdb user=root pid=9037 userid=100 xid=36076900 ]' LOG: SELECT typname
FROM pg_type
WHERE typnamespace = (SELECT oid FROM pg_namespace WHERE nspname = current_schema())
'2020-03-27T08:00:24Z UTC [ db=xdb user=root pid=9037 userid=100 xid=36076897 ]' LOG: SELECT datname FROM pg_database ORDER BY datname

Because of line breaks like '\n' and '\r' I am not able to check the complete query. For e.g.

# grep '2020' test.txt
'2020-03-27T08:00:24Z UTC [ db=xdb user=root pid=9037 userid=100 xid=36076898 ]' LOG: SELECT nspname FROM pg_namespace ORDER BY nspname
'2020-03-27T08:00:24Z UTC [ db=xdb user=root pid=9037 userid=100 xid=36076899 ]' LOG: SET search_path TO "public"
'2020-03-27T08:00:24Z UTC [ db=xdb user=root pid=9037 userid=100 xid=36076900 ]' LOG: SELECT typname
'2020-03-27T08:00:24Z UTC [ db=xdb user=root pid=9037 userid=100 xid=36076897 ]' LOG: SELECT datname FROM pg_database ORDER BY datname

As you can see, the line "FROM pg_type" is missing in the above output. How do I remove line breaks in this text file? I will need to keep line break before '2020' since that is another query.

How do I write a regular expression that will remove all breaks between "LOG:" and "'2020-"

shantanuo
  • 31,689
  • 78
  • 245
  • 403
  • Do you want to remove all carriage returns in the file? See [Remove carriage return in Unix](https://stackoverflow.com/questions/800030/remove-carriage-return-in-unix) (e.g. `sed -i 's/\r//g' file`) – Wiktor Stribiżew Mar 27 '20 at 11:25

4 Answers4

2

A bit of a dirty solution, but you could do something like:

cat my_log_file.log | tr '\n' ' ' | sed "s/\('[0-9]\{4\}\)/\r\n\1/g"

# OR, simpler version:

tr '\n' ' ' < my_log_file.log | sed "s/\('[0-9]\{4\}\)/\r\n\1/g"

basically, you delete all '\n', and then you add them again where they should be

Zorzi
  • 718
  • 4
  • 9
  • 1
    `cat my_log_file.log | tr '\n' ' '` = `tr '\n' ' ' < my_log_file.log`. See http://porkmail.org/era/unix/award.html. – Ed Morton Mar 27 '20 at 13:28
1
awk 'match($0, r) && NR>1 {print ""} 
    {printf "%s", $0} END {print ""}
    ' r="^'2020" test.txt
William Pursell
  • 204,365
  • 48
  • 270
  • 300
1

This might work for you (GNU sed):

sed '/^'\''2020/{:a;N;/^\('\''2020\).*\n\1/!s/\n/ /;ta;P;D}' file

If a line begins '2020, append the next line and if that line does not begin '2020, replace the newline between the lines with a space, append the next line and repeat. Otherwise print/delete the first line and repeat.

The OP has expressed How do I write a regular expression that will remove all breaks between "LOG:" and "'2020-".To handle any year, use:

sed '/^'\''[1-9][0-9][0-9][0-9]/{:a;N;/^'\''[1-9][0-9][0-9][0-9].*\n'\''[1-9][0-9][0-9][0-9]/!s/\n/ /;ta;P;D}' file
potong
  • 55,640
  • 6
  • 51
  • 83
1
$ awk '{printf "%s%s", (/^\047/ ? ors : ofs), $0; ors=ORS; ofs=OFS} END{printf "%s", ors}' file
'2020-03-27T08:00:24Z UTC [ db=xdb user=root pid=9037 userid=100 xid=36076898 ]' LOG: SELECT nspname FROM pg_namespace ORDER BY nspname
'2020-03-27T08:00:24Z UTC [ db=xdb user=root pid=9037 userid=100 xid=36076899 ]' LOG: SET search_path TO "public"
'2020-03-27T08:00:24Z UTC [ db=xdb user=root pid=9037 userid=100 xid=36076900 ]' LOG: SELECT typname FROM pg_type WHERE typnamespace = (SELECT oid FROM pg_namespace WHERE nspname = current_schema())
'2020-03-27T08:00:24Z UTC [ db=xdb user=root pid=9037 userid=100 xid=36076897 ]' LOG: SELECT datname FROM pg_database ORDER BY datname
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • While this is the correct method, the might be recommended to perform the check (`/^\047/`) more extensive by checking if it is followed by a date-format. However, this is upto the OP the decide if this might be needed. – kvantour Mar 27 '20 at 14:15
  • @kvantour true. Not sure if there is a definitive way to tell the start of a record from text that might be at the start of a line due to a string being split mid-record but I'd guess `/^\047[0-9]{4}(-[0-9]{2}){2}T[0-9]{2}(:[0-9]{2}){2}Z UTC \[[^]]+]\047/` would probably be robust enough. – Ed Morton Mar 27 '20 at 14:26