I am new to python so some ideas to move forward would be much appreciated
Problem: I have 44 locations with production data per day (15 mins interval) for the months for dec to june. The total data points for one day should be 4224(44 [locations]*4 [15 intervals]*24 [hrs in day]), but that is not the case and some data is missing. I need to filter these dates out.
Sample data I have in a csv file is show below: the date ranges from dec to june
datetime production
0 07-12-15 0:15 240
1 07-12-15 0:15 328
2 07-12-15 0:15 54
3 07-12-15 0:30 103
4 07-12-15 0:30 10
This is just the sample to understand the data format(actual file goes till june 2016), 0:15 is 15 minutes time step and 0 is hrs,
my draft code:
df=pd.read_csv("file_path")
df.set_index('datetime',inplace=True)
startdate = pd.Timestamp('2015-12-1 00:15:00', tz='UTC')
enddate = pd.Timestamp('2016-06-30 22:00:00', tz='UTC')
daterange = pd.date_range(start=startdate, end=enddate, freq='15T', tz='UTC')
for row in df.iterrows():
for single_date in daterange:
if single_date = 4224:
print("all fine")
else:
print (single_date)
I am still thinking about the selection of date.