4

I have a sort of FIZZ-BUZZ problem. I have a calendar with weekdays. In the next column I put 'FIZZ' at certain rows. If there is an empty gap between two 'FIZZ' I put 'BUZZ' in between except if the weekday is 'SUN'. See the code below (I use 0.15.2 pandas version):

import datetime
import pandas as pd

dict_weekday = {1: 'MON', 2: 'TUE', 3: 'WED', 4: 'THU', 5: 'FRI', 6: 'SAT', 7: 'SUN'}
df = pd.DataFrame(pd.date_range(datetime.date(2014, 1, 1), datetime.date(2014, 1, 10), freq='D'), columns=['Date'])
df['Weekday'] = df['Date'].apply(lambda x: dict_weekday[x.isoweekday()])
df['A'] = df['Weekday']
idx_lst = [0, 2, 3, 5, 9]
df.loc[idx_lst, 'A'] = 'FIZZ'
previous_idx = idx_lst[0]

for idx in idx_lst:
    print idx
    try:
        print df.loc[idx - 1, 'Weekday'], df.loc[idx, 'Weekday']
        if idx - previous_idx == 2 and df.loc[idx - 1, 'Weekday'] != 'SUN':
            df.loc[idx-1, 'A'] = 'BUZZ'
    except KeyError:
        continue

    previous_idx = idx

print df

The output is:

0
2
2014-12-18 00:00:00 FRI
3
FRI SAT
5
2014-12-21 00:00:00 MON
9
2014-12-18 00:00:00 FRI
        Date Weekday     A
0 2014-01-01     WED  FIZZ
1 2014-01-02     THU  BUZZ
2 2014-01-03     FRI  FIZZ
3 2014-01-04     SAT  FIZZ
4 2014-01-05     SUN  BUZZ
5 2014-01-06     MON  FIZZ
6 2014-01-07     TUE   TUE
7 2014-01-08     WED   WED
8 2014-01-09     THU   THU
9 2014-01-10     FRI  FIZZ

Notice line 4, there should be SUN in column A instead of BUZZ. Notice also that if idx-1 is not in idx_lst the .loc[idx-1] gives a time stamp. if I use .ix instead of .loc I get the correct answer:

0
2
THU FRI
3
FRI SAT
5
SUN MON
9
THU FRI
        Date Weekday     A
0 2014-01-01     WED  FIZZ
1 2014-01-02     THU  BUZZ
2 2014-01-03     FRI  FIZZ
3 2014-01-04     SAT  FIZZ
4 2014-01-05     SUN   SUN
5 2014-01-06     MON  FIZZ
6 2014-01-07     TUE   TUE
7 2014-01-08     WED   WED
8 2014-01-09     THU   THU
9 2014-01-10     FRI  FIZZ

Any explanations ? Thanks in advance.

Jon Clements
  • 138,671
  • 33
  • 247
  • 280
user3176500
  • 389
  • 2
  • 6
  • 15
  • Hello again, this seems related to your other question: http://stackoverflow.com/questions/27501694/pandas-iterating-over-dataframe-index-with-loc/27502078#27502078, this is even more baffling as it doesn't seem to work in a loop but if you did `df.loc[1, 'Weekday']` then it shows the correct output, also if you added a dummy column like `df['dummy'] = 0` after adding the 'Weekday' column then it all works – EdChum Dec 17 '14 at 11:46
  • Yes, but it's a more direct problem of what I am actually facing. Indeed, notice that if idx-1 is in idx_lst then it's ok, see for instance if idx = 3 then idx-1 = 2 and both, 3 and 2, are in idx_lst. I am thinking that it may work on some copy or something with index labels being idx_lst. Or is idx-1 not allowed in the .loc? – user3176500 Dec 17 '14 at 11:56
  • Well obviously so long as idx-1 is valid and in the index then it will work and you are handling the situation where this will raise an exception, personally I think this is some subtle bug – EdChum Dec 17 '14 at 12:16

1 Answers1

2

The surprising behavior is due to pd.Series attempting to coerce datetime-like values to pd.Timestamps.

df.loc[1] returns pd.Series([pd.Timestamp('2014-01-02'), 'THU', 'THU']) which unfortunately gets coerced to Timestamps since all three values are datetime-like:

In [154]: pd.Series([pd.Timestamp('2014-01-02'), 'THU', 'THU'])
Out[154]: 
0   2014-01-02
1   2014-12-18
2   2014-12-18
dtype: datetime64[ns]

In contrast, df.loc[2] does not coerce the values to Timestamps since 'FIZZ' is not datelike:

In [155]: pd.Series([pd.Timestamp('2014-01-03'), 'FRI', 'FIZZ'])
Out[155]: 
0    2014-01-03 00:00:00
1                    FRI
2                   FIZZ
dtype: object

The problem can be avoided by forming the Series, df['Weekday'] first before using .loc:

In [158]: df['Weekday'].loc[1]
Out[158]: 'THU'

This works because df['Weekday'].dtype remains dtype('O'); there is no conversion to Timestamps.


for idx in idx_lst:
    try:
        # print(idx-1, df.ix[idx - 1, 'Weekday'], df.loc[idx - 1, 'Weekday'])
        if idx - previous_idx == 2 and df['Weekday'].loc[idx - 1] != 'SUN':
            df.loc[idx-1, 'A'] = 'BUZZ'
    except KeyError:
        continue

    previous_idx = idx

yields

        Date Weekday     A
0 2014-01-01     WED  FIZZ
1 2014-01-02     THU  BUZZ
2 2014-01-03     FRI  FIZZ
3 2014-01-04     SAT  FIZZ
4 2014-01-05     SUN   SUN
5 2014-01-06     MON  FIZZ
6 2014-01-07     TUE   TUE
7 2014-01-08     WED   WED
8 2014-01-09     THU   THU
9 2014-01-10     FRI  FIZZ
Community
  • 1
  • 1
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677