I'm currently writing a code that takes in a .csv file that appears as so:
724070 93730 19800101 0 330 1.5 22000 -1.7 -5 1013.6 78
724070 93730 19800101 100 230 1.5 22000 -2.7 -5.5 1013.7 81
724070 93730 19800101 200 0 0 22000 -3.8 -4.9 1013.9 92
724070 93730 19800101 300 340 1.5 22000 -5.6 -6.1 1013.6 96
724070 93730 19800101 400 0 0 22000 -6.6 -7.7 1013.6 92
724070 93730 19800101 500 330 1.5 22000 -7.1 -8.8 1013.6 88
Where the first two columns are identifiers, the third column is the date, the fourth column is the hour and the last seven columns are values of interest. My end goal is to have a daily averaged values for the last seven columns for every day of the year.
I tried messing around by manipulating the data in only arrays, but I was convinced to go the route of pandas, so my code is fairly new. So far I have:
import pandas as pd
csv = raw_input('What is the name of your file? ')
cols = ['USAF','NCDC','DATE','HR','WND DIR','WND SPD', 'SKY CVR','TMPC','TMDC','PRES','RH']
data = pd.read_csv(csv, header = None, parse_dates = [['DATE', 'HR']], names = cols)
I'm having trouble stepping off from here since I'm just learning pandas, and I would appreciate some help -- the other questions that I viewed have yet to be helpful.
1st) There are three unique "USAF" identifiers in this .csv file, is there any way I can separate this data frame into three data frames, which are determined by the USAF column?
2nd) pandas is having a hard time recognizing my date and time format, which will not allow me to move further with the calculating the averages. How do I mitigate this?
Thanks in advance