-9

I am new to python so some ideas to move forward would be much appreciated

Problem: I have 44 locations with production data per day (15 mins interval) for the months for dec to june. The total data points for one day should be 4224(44 [locations]*4 [15 intervals]*24 [hrs in day]), but that is not the case and some data is missing. I need to filter these dates out.

Sample data I have in a csv file is show below: the date ranges from dec to june

 datetime  production
     0  07-12-15 0:15         240
     1  07-12-15 0:15         328
     2  07-12-15 0:15          54
     3  07-12-15 0:30         103
     4  07-12-15 0:30          10

This is just the sample to understand the data format(actual file goes till june 2016), 0:15 is 15 minutes time step and 0 is hrs,

my draft code:

df=pd.read_csv("file_path")
df.set_index('datetime',inplace=True)

startdate = pd.Timestamp('2015-12-1 00:15:00', tz='UTC')
enddate = pd.Timestamp('2016-06-30 22:00:00', tz='UTC')

daterange = pd.date_range(start=startdate, end=enddate, freq='15T',  tz='UTC')

for row in df.iterrows():
  for single_date in daterange:
   if single_date = 4224:
         print("all fine")
     else:
        print (single_date)

I am still thinking about the selection of date.

  • What did you try ? – Pierre.Sassoulas Jul 21 '16 at 09:01
  • please provide a small reproducible sample data set and a desired output / data set based on the sample - this will help to understand what do you want to achieve – MaxU - stand with Ukraine Jul 21 '16 at 09:03
  • 2
    Hello, welcome on SO. - What are the several identical "07-12-15 0:15" ? For the moment, we don't see missing points, since all the items that seems to be dates are identical. What is the "0:15" in them ? - Why did you undo the edit done by MaxU ? – eyquem Jul 21 '16 at 09:09
  • 2
    You can check [this](http://stackoverflow.com/q/20109391/2901002), then this question delete and create another. – jezrael Jul 21 '16 at 09:19
  • 1
    My recommendation: delete this question and open a new one including your edit. With 10 downvotes it's quite unlikely that anyone will take a look now. – IanS Jul 21 '16 at 10:19

1 Answers1

0

try this:

In [16]: df.ix[df.groupby(df['datetime'].dt.date)['production'].transform('nunique') < 44 * 4 * 24, 'datetime'].dt.date.unique()
Out[16]: array([datetime.date(2015, 12, 7)], dtype=object)

this will give you all rows for the "problematic" days:

df[df.groupby(df['datetime'].dt.date)['production'].transform('nunique') < 44 * 4 * 24]

PS there is a good reason why people asked you for a good reproducible sample data sets - with the one you have provided it's hardly possible to see whether the code is working correctly or not...

MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419