I'm dealing with some large CSV files. Basically I have two for the year 2009 and 2010. I read these both is seperatly using pandas, and then append the 2010 file to the end of the 2009 dataframe.
To do this I use the function:
def import_data():
with open(file_A, 'r') as f:
reader = pd.read_csv(f, sep=',', parse_dates=({'Date_Time': [0]}) )
with open (file_B, 'r') as B:
reader2= pd.read_csv(B, sep=',', parse_dates=({'Date_Time': [0]}))
reader=reader.append(reader2)
return reader
Basically, I then do some processing, resampling the data. However, all this takes such a long time due to the length of the files.
Is there a way to select certain rows based on defined inputs? e.g. just dates 01/10/2009 - 01/02/2010? Dates are all in the first column of the csv.
I know that this is possible for the columns using use_cols
within pandas.read_csv