Cleaning up IP output on command line

Question

I have a problem with the output L options ("grep-able" output); for instance, it outputs this:

| 14.138.12.21:123   | unknown                   | disabled    |
| 14.138.184.122:123 | unknown                   | disabled    |
| 14.138.179.27:123  | unknown                   | disabled    |
| 14.138.20.65:123   | unknown                   | disabled    |
| 14.138.12.235:123  | unknown                   | disabled    |
| 14.138.178.97:123  | unknown                   | disabled    |
| 14.138.182.153:123 | unknown                   | disabled    |
| 14.138.178.124:123 | unknown                   | disabled    |
| 14.138.201.191:123 | unknown                   | disabled    |
| 14.138.180.26:123  | unknown                   | disabled    |
| 14.138.13.129:123  | unknown                   | disabled    |

The above is neither very readable nor easy to understand.

How can I use Linux command-line utilities, e.g. sed, awk, or grep, to output something as follows, using the file above?

output

14.138.12.21
14.138.184.122
14.138.179.27
14.138.20.65
14.138.12.235

This is almost the exact same question you asked 3 hours ago: http://stackoverflow.com/questions/40325397/how-to-clean-up-masscan-output-ol. — I0_ol, Oct 30 '16 at 05:55

score 3 · Answer 1 · answered Oct 30 '16 at 04:28

Using awk with field separator as space, and : and getting the second field:

awk -F '[ :]' '{print $2}' file.txt

Example:

% cat file.txt
| 14.138.12.21:123   | unknown                   | disabled    |
| 14.138.184.122:123 | unknown                   | disabled    |
| 14.138.179.27:123  | unknown                   | disabled    |
| 14.138.20.65:123   | unknown                   | disabled    |
| 14.138.12.235:123  | unknown                   | disabled    |
| 14.138.178.97:123  | unknown                   | disabled    |
| 14.138.182.153:123 | unknown                   | disabled    |
| 14.138.178.124:123 | unknown                   | disabled    |
| 14.138.201.191:123 | unknown                   | disabled    |
| 14.138.180.26:123  | unknown                   | disabled    |
| 14.138.13.129:123  | unknown                   | disabled    |

% awk -F '[ :]' '{print $2}' file.txt
14.138.12.21
14.138.184.122
14.138.179.27
14.138.20.65
14.138.12.235
14.138.178.97
14.138.182.153
14.138.178.124
14.138.201.191
14.138.180.26
14.138.13.129

score 2 · Answer 2 · edited May 23 '17 at 12:33

AWK is perfect for cases when you want to split the file by "columns", and you know exactly that the order of values/columns is constant. AWK splits the lines by a field separator (which can be a regular expression like '[: ]'). The column names are accessible by their positions from the left: $1, $2, $3, etc.:

awk -F '[ :]' '{print $2}' src.log
awk -F '[ :|]' '{print $3}' src.log
awk 'BEGIN {FS="[ :|]"} {print $3}' src.log

You can also filter the lines with a regular expression:

awk -F '[ :]' '/138\.179\./ {print $2}' src.log

However, it is impossible to capture substrings with the regular expression groups.

SED is more flexible in regard to regular expressions:

sed -r 's/^[^0-9]*([0-9\.]+)\:.*/\1/' src.log

However, it lacks many useful features of the Perl-like regular expressions we used to use in every day programming. For example, even the extended syntax (-r) fails to interpret \d as a number.

Perhaps, Perl is the most flexible tool for parsing files. You can opt to simple expressions:

perl -n -e '/^\D*([^:]+):/ and print "$1\n"' src.log

or make the matching as strict as you like:

perl -n -e '/^\D*((?:\d{1,3}\.){3}\d{1,3}):/ and print "$1\n"' src.log

score 1 · Answer 3 · answered Oct 30 '16 at 04:53

1

using sed

sed -r 's/^ *[|] *([0-9]+[.][0-9]+[.][0-9]+[.][0-9]+):[0-9]{3}.*/\1/

answered Oct 30 '16 at 04:53

repzero

8,254
2
18
40

Cleaning up IP output on command line

3 Answers3