1

The Problem

I have a CSV file that contains a large number of items.

The first column can contain either an IP address or random garbage. The only other column I care about is the fourth one.

I have written the below snippet of code in an attempt to check if the first column is an IP address and, if so, write that and the contents of the fourth column to another CSV file side by side.

with open('results.csv','r') as csvresults:
    filecontent = csv.reader(csvresults)
    output = open('formatted_results.csv','w')
    processedcontent = csv.writer(output)

    for row in filecontent:
        first = str(row[0])
        fourth = str(row[3])
        if re.match('\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}', first) != None:
            processedcontent.writerow(["{},{}".format(first,fourth)])
        else:
            continue
    output.close()

This works to an extent. However, when viewing in Excel, both items are placed in a single cell rather than two adjacent ones. If I open it in notepad I can see that each line is wrapped in quotation marks. If these are removed Excel will display the columns properly.

Example Input

1.2.3.4,rubbish1,rubbish2,reallyimportantdata

Desired Output

1.2.3.4    reallyimportantdata - two separate columns

Actual Output

"1.2.3.4,reallyimportantdata" - single column

The Question

Is there any way to fudge the format part to not write out with quotations? Alternatively, what would be the best way to achieve what I'm trying to do?

I've tried writing out to another file and stripping the lines but, despite not throwing any errors, the result was the same...

2 Answers2

2

writerow() takes a list of elements and writes each of those into a column. Since you are feeding a list with only one element, it is being placed into one column.

Instead, feed writerow() a list:

processedcontent.writerow([first,fourth])
MrAlexBailey
  • 5,219
  • 19
  • 30
  • I'm not going to lie to you, I'm a little annoyed that this is the answer. This was the first thing I'd tried but I was trying to cast the items to string within the call to `writerow` which threw an error. I only cast to string before that part afterwards when I was trying something different and never revisited. – I_GNU_it_all_along Aug 15 '17 at 15:19
  • 1
    But thanks very much all the same! It's always the bloody simple things... – I_GNU_it_all_along Aug 15 '17 at 15:20
1

Have you considered using Pandas?

import pandas as pd

df = pd.read_csv("myFile.csv", header=0, low_memory=False, index_col=None)
fid = open("outputp.csv","w")
for index, row in df.iterrows():
    aa=re.match(r"^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$",row['IP'])
    if aa:
        tline = '{0},{1}'.format(row['IP'], row['fourth column'])
        fid.write(tline)
output.close()

There may be an error or two and I got the regex from here. This assumes the first row of the csv has titles which can be referenced. If it does not then you can use header = None and reference the columns with iloc

Come to think of it you could probably run the regex on the dataFrame, copy the first and fourth column to a new dataFrame and use the to_csv method in pandas.

Eric Jensen
  • 193
  • 11
  • Hey man, appreciate the answer. For portability on this particular script I'd prefer not to include non-standard libraries but I will definitely consider using it in future as it looks pretty handy. Turns out I was over-complicating things after all (see the other user's answer). – I_GNU_it_all_along Aug 15 '17 at 15:31