-1

I have a problem with the output L options ("grep-able" output); for instance, it outputs this:

| 14.138.12.21:123   | unknown                   | disabled    |
| 14.138.184.122:123 | unknown                   | disabled    |
| 14.138.179.27:123  | unknown                   | disabled    |
| 14.138.20.65:123   | unknown                   | disabled    |
| 14.138.12.235:123  | unknown                   | disabled    |
| 14.138.178.97:123  | unknown                   | disabled    |
| 14.138.182.153:123 | unknown                   | disabled    |
| 14.138.178.124:123 | unknown                   | disabled    |
| 14.138.201.191:123 | unknown                   | disabled    |
| 14.138.180.26:123  | unknown                   | disabled    |
| 14.138.13.129:123  | unknown                   | disabled    |

The above is neither very readable nor easy to understand.

How can I use Linux command-line utilities, e.g. sed, awk, or grep, to output something as follows, using the file above?

output

14.138.12.21
14.138.184.122
14.138.179.27
14.138.20.65
14.138.12.235
Scott Weldon
  • 9,673
  • 6
  • 48
  • 67
  • 2
    This is almost the exact same question you asked 3 hours ago: http://stackoverflow.com/questions/40325397/how-to-clean-up-masscan-output-ol. – I0_ol Oct 30 '16 at 05:55

3 Answers3

3

Using awk with field separator as space, and : and getting the second field:

awk -F '[ :]' '{print $2}' file.txt

Example:

% cat file.txt
| 14.138.12.21:123   | unknown                   | disabled    |
| 14.138.184.122:123 | unknown                   | disabled    |
| 14.138.179.27:123  | unknown                   | disabled    |
| 14.138.20.65:123   | unknown                   | disabled    |
| 14.138.12.235:123  | unknown                   | disabled    |
| 14.138.178.97:123  | unknown                   | disabled    |
| 14.138.182.153:123 | unknown                   | disabled    |
| 14.138.178.124:123 | unknown                   | disabled    |
| 14.138.201.191:123 | unknown                   | disabled    |
| 14.138.180.26:123  | unknown                   | disabled    |
| 14.138.13.129:123  | unknown                   | disabled    |

% awk -F '[ :]' '{print $2}' file.txt
14.138.12.21
14.138.184.122
14.138.179.27
14.138.20.65
14.138.12.235
14.138.178.97
14.138.182.153
14.138.178.124
14.138.201.191
14.138.180.26
14.138.13.129
heemayl
  • 39,294
  • 7
  • 70
  • 76
2

AWK is perfect for cases when you want to split the file by "columns", and you know exactly that the order of values/columns is constant. AWK splits the lines by a field separator (which can be a regular expression like '[: ]'). The column names are accessible by their positions from the left: $1, $2, $3, etc.:

awk -F '[ :]' '{print $2}' src.log
awk -F '[ :|]' '{print $3}' src.log
awk 'BEGIN {FS="[ :|]"} {print $3}' src.log

You can also filter the lines with a regular expression:

awk -F '[ :]' '/138\.179\./ {print $2}' src.log

However, it is impossible to capture substrings with the regular expression groups.

SED is more flexible in regard to regular expressions:

sed -r 's/^[^0-9]*([0-9\.]+)\:.*/\1/' src.log

However, it lacks many useful features of the Perl-like regular expressions we used to use in every day programming. For example, even the extended syntax (-r) fails to interpret \d as a number.

Perhaps, Perl is the most flexible tool for parsing files. You can opt to simple expressions:

perl -n -e '/^\D*([^:]+):/ and print "$1\n"' src.log

or make the matching as strict as you like:

perl -n -e '/^\D*((?:\d{1,3}\.){3}\d{1,3}):/ and print "$1\n"' src.log
Community
  • 1
  • 1
Ruslan Osmanov
  • 20,486
  • 7
  • 46
  • 60
1

using sed

sed -r 's/^ *[|] *([0-9]+[.][0-9]+[.][0-9]+[.][0-9]+):[0-9]{3}.*/\1/
repzero
  • 8,254
  • 2
  • 18
  • 40