I have a large text file (about 44MB) and I'm trying to read and filter it without bankrupting my computer's memory. The text file is comma seperated, with each column representing different weather variables and then each row representing their values over time. I want to select only the rows in which the wind direction lies between 10 and 100 degrees and then to write these rows to a new file (txt file or csv would be fine). I am a complete python noob, so any help would be greatly appreciated!
A sample of the data
hd,Station Number,Year Month Day Hours Minutes in YYYY,MM,DD,HH24,MI format in Local time,Year Month Day Hours Minutes in YYYY,MM,DD,HH24,MI format in Local standard time,Year Month Day Hours Minutes in YYYY,MM,DD,HH24,MI format in Universal coordinated time,Precipitation since last (AWS) observation in mm,Quality of precipitation since last (AWS) observation value,Air Temperature in degrees Celsius,Quality of air temperature,Air temperature (1-minute maximum) in degrees Celsius,Quality of air temperature (1-minute maximum),Air temperature (1-minute minimum) in degrees Celsius,Quality of air temperature (1-minute minimum),Wet bulb temperature in degrees Celsius,Quality of Wet bulb temperature,Wet bulb temperature (1 minute maximum) in degrees Celsius,Quality of wet bulb temperature (1 minute maximum),Wet bulb temperature (1 minute minimum) in degrees Celsius,Quality of wet bulb temperature (1 minute minimum),Dew point temperature in degrees Celsius,Quality of dew point temperature,Dew point temperature (1-minute maximum) in degrees Celsius,Quality of Dew point Temperature (1-minute maximum),Dew point temperature (1 minute minimum) in degrees Celsius,Quality of Dew point Temperature (1 minute minimum),Relative humidity in percentage %,Quality of relative humidity,Relative humidity (1 minute maximum) in percentage %,Quality of relative humidity (1 minute maximum),Relative humidity (1 minute minimum) in percentage %,Quality of Relative humidity (1 minute minimum),Wind (1 minute) speed in km/h,Wind (1 minute) speed quality,Minimum wind speed (over 1 minute) in km/h,Minimum wind speed (over 1 minute) quality,Wind (1 minute) direction in degrees true,Wind (1 minute) direction quality,Standard deviation of wind (1 minute),Standard deviation of wind (1 minute) direction quality,Maximum wind gust (over 1 minute) in km/h,Maximum wind gust (over 1 minute) quality,Visibility (automatic - one minute data) in km,Quality of visibility (automatic - one minute data),Mean sea level pressure in hPa,Quality of mean sea level pressure,Station level pressure in hPa,Quality of station level pressure,QNH pressure in hPa,Quality of QNH pressure,#
hd, 40842,2000,03,20,10,50,2000,03,20,10,50,2000,03,20,00,50, ,N, 25.7,N, 25.7,N, 25.6,N, 21.5,N, 21.5,N, 21.4,N, 19.2,N, 19.2,N, 19.0,N, 67,N, 68,N, 66,N, 13,N, 9,N,100,N, 4,N, 15,N, ,N,1018.6,N,1017.5,N,1018.6,N,#
hd, 40842,2000,03,20,10,51,2000,03,20,10,51,2000,03,20,00,51, 0.0,N, 25.6,N, 25.8,N, 25.6,N, 21.5,N, 21.6,N, 21.5,N, 19.2,N, 19.4,N, 19.2,N, 68,N, 68,N, 66,N, 11,N, 9,N,107,N, 11,N, 13,N, ,N,1018.6,N,1017.5,N,1018.6,N,#
hd, 40842,2000,03,20,10,52,2000,03,20,10,52,2000,03,20,00,52, 0.0,N, 25.8,N, 25.8,N, 25.6,N, 21.7,N, 21.7,N, 21.5,N, 19.5,N, 19.5,N, 19.2,N, 68,N, 69,N, 66,N, 11,N, 9,N, 83,N, 13,N, 13,N, ,N,1018.6,N,1017.5,N,1018.6,N,#
this is my best code, which keeps produce a NULL BYTE error message
import csv
with open("X:\hd_040842_2000_03.txt") as input, open("X:\wanted.txt", "w") as wanted, open("X:\unwanted.txt", "w") as unwanted:
reader = csv.DictReader(input, delimiter=",")
fieldnames = reader.fieldnames
writer_wanted = csv.DictWriter(college, fieldnames, delimiter=",")
writer_unwanted = csv.DictWriter(adult, fieldnames, delimiter=",")
writer_wanted.writeheader()
writer_unwanted.writeheader()
for row in reader:
if float(row["Wind (1 minute) direction in degrees true"]) > 10.0 and float(row["Wind (1 minute) direction in degrees true"]) < 110.0:
writer_college.writerow(row)
else:
writer_adult.writerow(row)
So you know, I adapted this code as best I could from this post Use Python to select rows with a particular range of values in one column