Python read csv file and filter data

Question

I appologize if this was already answered before, but I checked I bunch of posts and just cannot understand what is wrong with my code. I'm trying to read a csv file in python (see bellow) and filter out rows of data by the value in the second column (angle). Then I want to create a new output file with filtered time and angle values. I only get the output file with headers written in.

csv file:

time,angle
0,56
1,89
2,112
3,189
4,122
5,123

Code:

import csv

#define the min and max value of angle
alpha_min = 110
alpha_max = 125

#read csv file and loop through with a filter
with open('test_csv.csv', 'r') as input_file:
    csv_reader = csv.reader(input_file)#, delimiter=',')
    #header = next(input_file).strip("\n").split(",")
    results = filter(lambda row: alpha_min<row[1]<alpha_max, csv_reader)

#create output file
with open('test_output_csv.csv', "w") as output_file:
    csv_writer = csv.writer(output_file, delimiter=',')
    csv_writer.writerow(header)
    for result in results:
        csv_writer.writerow(result)

@AskoldIlvento Actually, Python does work that way, but `row[1]` is a string so needs `int(row[1])`. — Mark Tolonen, May 18 '20 at 15:37

mschoder · Answer 1 · 2020-05-18T17:55:06.917

1

I would suggest using pandas library for this workflow, which will be faster and more efficient than looping through each line of your csv file. Something like the below:

import pandas as pd

#define the min and max value of angle
alpha_min = 110
alpha_max = 125

# read input and filter angle data
df = pd.read_csv('test_csv.csv')
df = df[(df['angle'] < alpha_max) & (df['angle'] > alpha_min)]

# write output
df.to_csv('output.csv')

edited May 18 '20 at 17:55

answered May 18 '20 at 15:26

mschoder

44
3

thanks for the input. I suspect that the input .csv file could have up to 100k lines, do you think this cold be a problem? – user2882635 May 19 '20 at 05:30
1

No, size wouldn't be a problem here at least until you're more than 10M lines – mschoder May 20 '20 at 04:46
One more question: is it possible to the same as in the code in your answer but change something in the first (time) column, for instance change the time format from 16:55:45 to 165545 in the output file? I'm looking into Pandas but it has such vast funcionality, I'm getting lost. – user2882635 May 21 '20 at 10:53
1

Yes, definitely. Depends on what the datatype of the existing time value is. If it's a string, you could just remove colons with something like `df['time'] = df['time'].str.replace(':', '')` If it's a timestamp, you can cast it to a string first, using the answer [here](https://stackoverflow.com/questions/44741587/pandas-timestamp-series-to-string) – mschoder May 22 '20 at 15:46

Leo Arad · Answer 2 · 2020-05-18T15:40:46.343

You can do

import csv

#define the min and max value of angle
alpha_min = 110
alpha_max = 125

#read csv file and loop through with a filter
with open('test_csv.csv', 'r') as input_file:
    csv_reader = csv.reader(input_file)#, delimiter=',')
    lines = [i for i in csv_reader]
    header = lines[0]
    results = filter(lambda row: alpha_min<int(row[1])<alpha_max, lines[1:])

#create output file
with open('test_output_csv.csv', "w", newline='') as output_file:
    csv_writer = csv.writer(output_file, delimiter=',')
    csv_writer.writerow(header) 
    csv_writer.writerows(results)

That will save to the file

time,angle
2,112
4,122
5,123

score 0 · Accepted Answer · answered May 18 '20 at 15:50

The fields of a csv row are strings so you need int(row[1]) to work correctly. I also recommend a list comprehension for the filtering, or pandas for speed. next(csv_reader) will read one row to capture the headers as well.

Note: use newline='' with the csv module as documented to avoid blank lines between each row.

import csv

alpha_min = 110
alpha_max = 125

with open('test.csv','r',newline='') as input_file:
    csv_reader = csv.reader(input_file)
    header = next(csv_reader)
    results = [row for row in csv_reader if alpha_min < int(row[1]) < alpha_max]

with open('output.csv','w',newline='') as output_file:
    csv_writer = csv.writer(output_file)
    csv_writer.writerow(header)
    csv_writer.writerows(results)

Python read csv file and filter data

3 Answers3