0

I have a very specific need, for which I've been trying to solve, without success.

I have a log, which is created by a dump of a tcp/ip socket... It converts the Hex to ASCII, but naturally there are some special characters in it.

I've managed to remove them, but I'm currently experiencing a difficulty: Sometimes, an 0x0A is sent, which messes with my applications... I'm trying to remove it, but then it also removes the valid 0x0A at the end of the line...

Basically, I have, in the log file:

08-14-2017 10:00:00 String={Teste String}
08-14-2017 10:00:00 String={
Teste String2}
08-14-2017 10:00:00 String={
Teste String3}
08-14-2017 10:00:00 String={Teste String4}

I want the final result as

08-14-2017 10:00:00 String={Teste String}
08-14-2017 10:00:00 String={Teste String2}
08-14-2017 10:00:00 String={Teste String3}
08-14-2017 10:00:00 String={Teste String4}

The characters are always between {}, so every 0x0A after the } is valid, but inside is not.

every command I've tried either removes all the 0x0A, or just not work at all.

I've tried things like

sed 's/^[^}]*}//'
sed 's/\x0A$//'

any thoughts?

anubhava
  • 761,203
  • 64
  • 569
  • 643

5 Answers5

3

Another simpler awk:

awk '{printf "%s%s", $0, (/}/ ? ORS : "")}' file

08-14-2017 10:00:00 String={Teste String}
08-14-2017 10:00:00 String={Teste String2}
08-14-2017 10:00:00 String={Teste String3}
08-14-2017 10:00:00 String={Teste String4}

This awk command checks presence of } in a line and then only prints line break, otherwise it prints record without line break.

anubhava
  • 761,203
  • 64
  • 569
  • 643
1

This is certainly possible with sed, but it's easier to read and understand in awk:

awk 'BEGIN{ OFS=FS="{"; ORS=RS="}" } { sub(/[^[:print:]]/,"",$2) } 1' input.txt

What does this do?

  • First, we set our input and output field separators to {, and our input and output record separators to }. This lets us predictably grab the bracketed text as a specific field (at least based on your sample data).
  • Next, we replace any non-printable characters in field #2 with a null string, eliminating newlines, backspaces, etc.
  • Finally, we print the line using awk shorthand.
ghoti
  • 45,319
  • 8
  • 65
  • 104
1

With GNU awk for multi-char RS we can just isolate each {...} string and remove newlines within it:

$ awk -v RS='{[^}]+}' '{ORS=gensub(/\n/,"","g",RT)}1' file
08-14-2017 10:00:00 String={Teste String}
08-14-2017 10:00:00 String={Teste String2}
08-14-2017 10:00:00 String={Teste String3}
08-14-2017 10:00:00 String={Teste String4}

For this specific case the other awk answers will work just fine, the above is just a more general solution to the problem of isolating a delimited string to then perform operations on it like removing characters as in this case.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
0

With sed:

Linux:

$ sed -r ':a;N;$!ba;s/(\{[^}]*)\\n([^{]*\})/\1\2/g' file
08-14-2017 10:00:00 String={Teste String}
08-14-2017 10:00:00 String={Teste String2}
08-14-2017 10:00:00 String={Teste String3}
08-14-2017 10:00:00 String={Teste String4}

FreeBSD and macOS:

sed -e ':a' -e 'N;$!ba' -e 's/(\{[^}]*)\\n([^{]*\})/\1\2/g' file

Explanations

-e ':a' -e 'N;$!ba' allows us to consider both the current and the next line on each iteration of sed. See this SO answer for details.

(\{[^}]*) ensures there's an opening brace not followed by a closing one.

([^{]*\}) does the opposite.

pchaigno
  • 11,313
  • 2
  • 29
  • 54
  • Doesn't work for me in FreeBSD or macOS. Is this GNU-sed specific? – ghoti Aug 14 '17 at 14:09
  • Works when you split it up: `sed -E -e ':a' -e 'N;$!ba' -e 's/(\{[^}]*)\n([^{]*\})/\1\2/g'` .. non-GNU sed appears to want labels not to be followed by semicolons. – ghoti Aug 14 '17 at 14:14
  • @ghoti Thanks. I updated. This should work with both GNU-sed and non-GNU-sed (?). – pchaigno Aug 14 '17 at 14:17
  • 2
    `\n` is not portable across sed versions (you need backslash followed by a literal newline for portability) and `-E` will only work in GNU and OSX sed while `-r` will only work in GNU sed. – Ed Morton Aug 14 '17 at 15:03
  • 1
    Also, sed in Solaris 10 does not support `-E` or `-r`, so a BRE-based solution would be preferred for maximum portability. In bash, you may be able to get the embedded literal newline using format substitution, i.e `$'foo\nbar'`. – ghoti Aug 14 '17 at 15:06
  • And at the end of the day this simply isn't a job for sed at all since an awk solution will be clearer, simpler, more efficient, more portable, easier to enhance/maintain, etc. so why bother polishing it? – Ed Morton Aug 14 '17 at 15:08
0

Perl:

$ perl -0777 -pe 's/({[^}]*)\x0A([^}]*})/\1\2/g' file
08-14-2017 10:00:00 String={Teste String}
08-14-2017 10:00:00 String={Teste String2}
08-14-2017 10:00:00 String={Teste String3}
08-14-2017 10:00:00 String={Teste String4}

Pure Bash (based on anubhava's awk):

while IFS="\n" read -r line; do 
    le=""
    [[ $line =~ \} ]] && le=$'\n'
    printf "%s%s" "$line" "$le"
done <file  
dawg
  • 98,345
  • 23
  • 131
  • 206