0

I am analyzing a large weather data file, Data.csv. I need to write a program in Python that will filter the Data.csv file and keep the following columns only: STATION, NAME/LOCATION, DATE, AWND, SNOW. Then save the filtered file and name it filteredData.csv.

I am using Python 3.8. I have only been able to somewhat figure out how to filter the columns I need within a print function. How do I filter this file and then save the filtered file?

import csv

filename = 'Data.csv' 
f = open(filename, 'rt')
reader = csv.reader(f,delimiter=',')
for column in reader:
print(column[0] + "," + column[1] + "," + column[2] + "," + column[3] + "," + column[4] + "," + column[13])

A small section of the Data.csv file

Bridget
  • 39
  • 1
  • 7

3 Answers3

1

If you're running this under windows you can simply run the code you already wrote with "> newfile.csv" at the end of the command to plug output into a test file.

If you want to do it within the code though:

import csv

new_filename = 'Reduced_Data.csv'
filename = 'Data.csv' 
f = open(filename, 'rt')
reader = csv.reader(f,delimiter=',')
for row in reader:
    with open(new_filename, 'a') as output:
        output.write('"{}","{}","{}","{}","{}","{}"\n'.format(column[0],column[1],column[2],column[3],column[4],column[13]))
  • 1
    Using csv.writer might avoid some quoting headaches. And you could filter the columns using a list comprehension like [column[k] for k in [2, 5, 17]] . – Ture Pålsson Apr 08 '20 at 07:53
1

It can be quickly done using Pandas

import pandas as pd

weather_data = pd.read_csv('Data.csv')
filtered_weather = weather_data[['Column_1','Column_1']] #Select the column names that you want
filtered_weather.to_csv('new_file',index=False)
mode
  • 69
  • 5
0

check out the CSV reader and this example. you can do something like:

import csv

content = []
with open('Data.csv', 'r') as file:
    reader = csv.reader(file, delimiter = ','))
    for row in reader:
        content.append(row)

print(content)

## now writing them in a file:

with open('filteredData.csv', 'w', newline='') as file:
    writer = csv.writer(file)
    writer.writerow(['STATION', 'NAME LOCATION', 'DATE', 'AWND', 'SNOW'])
    for i in range(1, len(content)):
        writer.writerow[content[i][0], content[i][1], content[i][2], content[i][3], content[i][13])  ## i left out some columns, so they will not be in the file later, maybe I did not get that right.

but honestly, I would use this approach but that means only copy & paste.

xtlc
  • 1,070
  • 1
  • 15
  • 41