1

I know this question was asked before but each case is different... My plea is this:

df = pd.read_csv(‘file.csv’)
# convert the string into a datetime object
time = pd.to_datetime(df.dttm_utc)
Month=time.dt.month
Day=time.dt.day
Hour=time.dt.Hour
InDayLightSavings=True
if (Month<3): InDayLightSavings=False
if (Month==3) and (Day<11) and (Hour<2): InDayLightSavings=False
if (Month>11): InDayLightSavings=False
if (Month==11) and (Day>4)and (Hour>=2): InDayLightSavings=False

if (InDayLightSavings):
    time=time-datetime.timedelta(hours=1)

And it returns, as you guessed correctly,Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). I used this with timestamp, changing it to an ISO8601 before and this method works but it doesn't work for series apparently. And I tried adding .any() and it doesn't work. I also change and to & as suggested in other thread. A par of my file.csv looks like this, running til end of 2012:

timestamp   dttm_utc           value
1325376300 2012-01-01 0:05:00 16.9444
1325376600 2012-01-01 0:10:00 16.6837
1325376900 2012-01-01 0:15:00 16.6837
1325377200 2012-01-01 0:20:00 16.9444
1325377500 2012-01-01 0:25:00 16.1623
1325377800 2012-01-01 0:30:00 16.6837
Desired output: Include is an example of data in 15 min interval

3/13/2016 1:00 51
3/13/2016 1:15 48
3/13/2016 1:30 50.4
3/13/2016 1:45 51
3/13/2016 3:00 47.4
3/13/2016 3:15 49.8
3/13/2016 3:30 51
3/13/2016 3:45 51
3/13/2016 4:00 48.6

Any help is appreciated.Thank you!

  • And the desired output would be? – MSeifert Jun 13 '18 at 04:46
  • It would change the datetime InDayLightSaving range to be +1 hour compare to the original data. Just imagine you turn the clock at 2am forward to 3am on 11/3/12 (daylight saving day) – Anh Phuong Bui Jun 13 '18 at 04:50
  • I meant it would be helpful if you included the desired output in the question (like you included the input). Just so answerers know exactly what you want. – MSeifert Jun 13 '18 at 06:00

2 Answers2

1

How about using & instead of and?

The or and and python statements require truth-values. We do this because in pandas these are considered ambiguous so we should use bitwise operators "|" (or) or "&" (and) operations.

if (Month<3): InDayLightSavings=False
if (Month==3) & (Day<11) & (Hour<2): InDayLightSavings=False
if (Month>11): InDayLightSavings=False
if (Month==11) & (Day>4) & (Hour>=2): InDayLightSavings=False

(PS : I could've commented this, but dont have the reputation to do so.)

Kranthi Kiran
  • 121
  • 1
  • 6
1

The exception you are seeing is due to the fact that you try to evaluate a series with many different entries against a set of single conditions. Briefly, let's have a look what you do:

Error analysis (why not to do it like that):

First, you did take a pandas dataframe column and then converted it to datetime, which of course also returns a column (series).

time = pd.to_datetime(df.dttm_utc) # Convert content of dttm_utc COLUMN to datetime
                                   # This returns a dataframe COLUMN / series
Month = time.dt.month              # Convert content of your COLUMN/series to month
Day = time.dt.day                  # Convert content of your COLUMN/series to month
Hour = time.dt.Hour                # Convert content of your COLUMN/series to month

Your mistake: You then try to assess specific conditions along the series:

if (Month == whatever_condition): 
    do_something()

However, you can't compare a single condition to a series, at least not like that. Python doesn't know which entry in the series you mean, as some values in it may be different to others. That means, for some items in the series the condition may be fulfilled, for others not. Hence the ValueError: The truth value of a series is ambiguous.

What you want to do instead:

Evaluate item by item, ideally in a vectorized way. My suggestion: stay in the pandas dataframe all time:

df['Datetime'] = pd.to_datetime(df['dttm_utc']) # Add second column with datetime format
df['Month'] = df.Datetime.dt.month     # Extract month and add to new column
                                                # Same for day
df.loc[(df.Month < 3), 'InDayLightSavings'] = False 
# You can add multiple conditions here
# Finally, your filter:
df.loc[(df.InDayLightSavings == True), 'Time'] = df['Time'] - dt.timedelta(hours=1) 
# dt when import datetime as dt, else just datetime

Further reading here, here, here and here.

sudonym
  • 3,788
  • 4
  • 36
  • 61
  • Thank you! I tried it and now it show the new error of TypeError: Cannot compare type 'Period' with type 'int'? – Anh Phuong Bui Jun 13 '18 at 07:18
  • The problem now is it not running my if statement: 'if ('InDaylightSavings'): df.Datetime= df.Datetime +datetime.timedelta(hours=1) '. It changes all the Datetime hour to +1 but not the specific subset? – Anh Phuong Bui Jun 13 '18 at 07:44
  • that's because your if statement is wrong - see code edit. – sudonym Jun 13 '18 at 07:46
  • looks like it working. now i just need to chain complex criterias... which I am not yet there but I will try. Thank you! – Anh Phuong Bui Jun 13 '18 at 08:05
  • happy coding! glad to help. – sudonym Jun 13 '18 at 08:09
  • Im stuck...How can I only get the data from 3/11 from 2am...I tried & and |. I am reading on bitwise operation and I dont know if this is applicable? – Anh Phuong Bui Jun 13 '18 at 08:30
  • if you have new questions, post them but don't put everything into comments - this question here is answered. https://stackoverflow.com/questions/15315452/selecting-with-complex-criteria-from-pandas-dataframe – sudonym Jun 13 '18 at 08:35