-1

I've been trying to get my head round this for a few hours now and thought I'd come here and ask for help.

I have a CSV file of IP addresses from a log file that I want to run through and get a WHOIS netrange and company name from and then append the result to the end of the CSV.

So far what I have managed to do is get the whois results to a separate csv

echo ip, company, > result.csv
for ip in $(grep -E -o "([0-9]{1,3}[\.]){3}[0-9]{1,3}" source.csv); do 
    whois $ip | grep -i -e 'netrange\|inetnum' -e 'org-name\|orgname' \
    | awk 'BEGIN{FS="NetRange:|inetnum:|OrgName:|org-name:"} {print $2","$3}'
    |xargs; done >> result.csv

my challenge is how to add my 2 new columns back into the source.csv? I have tried using

paste -d, source.csv result.csv

but all that happens is the values in result.csv overwrite the first few columns of source.csv

my source.csv looks something like the below

ip address       requests   number of visits
66.249.90.77      2149      200
66.249.66.1       216       233   

My result.csv

ip range                     company
66.249.64.0 - 66.249.95.255 Google Inc.
66.249.64.0 - 66.249.95.255 Google Inc.

i would like my final csv to look like

ip            requests  number of visits    ip range               company
66.249.90.77    2149    200           66.249.64.0 - 66.249.95.255   Google Inc.
66.249.66.1      2161   233           66.249.64.0 - 66.249.95.255   Google Inc.

If possible I would prefer to accomplish this with BASH rather than installing any 3rd party tools etc. I have tried the python package ipwhois but my python knowledge is far less than my limited BASH knowledge so I abandoned it lest I continue wasting time!

Any help is much appreciated.

bluejay
  • 11
  • 1
  • 4
  • Do you require that to be a one-liner, or would a loop-based solution work for you? – Fred Jan 16 '17 at 23:14
  • `paste` with your example files is working for me (although based on your desired output I don't think you want the `-d,` delimiter). Is it possible your `source.csv` has DOS line endings? In a Unix environment the extra CR can cause the initial characters in the line to be overwritten by what follows it. Maybe try `cat -v source.csv` and see if any `^M`s appear. – jas Jan 16 '17 at 23:20
  • @Fred any would be fine - thank you. – bluejay Jan 16 '17 at 23:35
  • @jas you are exactly right! I found extra carriage returns. Is there anyway to fix it? Thank you both – bluejay Jan 16 '17 at 23:35
  • Yes, lots of ways to fix it. The easiest is if you have a `dos2unix` utility handy. Otherwise, see for example http://stackoverflow.com/a/2613834/2229272 – jas Jan 16 '17 at 23:39
  • I double checked when I edited the format of your `for` loop. I don't see what you finish it with `xargs`. Usually `xargs` expects a script/program as an "option" and the list of files coming thru the pipe (`|`), are file arguments to that script/program. Am I missing something (it wouldn't be the first time ;-) ). Good luck. – shellter Jan 16 '17 at 23:49
  • If your problem is fixed by removing `^M` chars at line-end, this Q really won't help anyone. Would you please delete it. Good luck. – shellter Jan 16 '17 at 23:50
  • @jas thank you so much for helping me. I have fixed the script by using sed $'s/\r$//' to convert to Unix format. I have discovered another formatting issue but this is with the way WHOIS record is stored for some IP addresses. I'll raise a new question if I can't sort it myself. Thank you everyone for your help. I can sleep now!! – bluejay Jan 17 '17 at 00:22
  • You're welcome, glad it helped! – jas Jan 17 '17 at 00:30

1 Answers1

0

Try putting this in a script file and running it while inside the directory containing your data file.

#!/bin/bash -ue

# Pattern for spacing characters
sp="[[:space:]]*"

#Pattern for non-spacing characters
nsp="[^[:space:]]+"

# Iterate on each line in the data file
while IFS= read -r line
do
  [[ "$line" =~ ^$sp($nsp)$sp($nsp)$sp($nsp)$sp$ ]] || continue
  f1="${BASH_REMATCH[1]}"
  f2="${BASH_REMATCH[2]}"
  f3="${BASH_REMATCH[3]}"
  # Extract information from whois
  whois_data="$(whois "$f1")"
  range="$(grep NetRange  <<<"$whois_data" | cut -f2 -d":")"
  company="$(grep OrgName <<<"$whois_data" | cut -f2 -d":")"
  echo $f1,$f2,$f3,$range,$company
done <"source.csv"

The output is formatted as fields separated by commas, and there should be some trimming of the range and company variables before they are used (to remove spaces at the beginning and end), but this should give you the idea.

Basically, the code does not try to merge two files together, but rather extracts the fields from each line of the source file, performs the whois, extracts the two fields needed from it, and then outputs all five fields to the output without an intermediary step.

Fred
  • 6,590
  • 9
  • 20