5

I have a CSV that looks like this (and when brought into a pandas Dataframe with read_csv(), it looks the same).

enter image description here

I want to update the values in column ad_requests according to the following logic:

For a given row, if ad_requests has a value, leave it alone. Else, give it a value of the previous row's value for ad_requests minus the previous row's value for impressions. So in the first example, we would like to end up with:

enter image description here

I get partially there:

df["ad_requests"] = [i if not pd.isnull(i) else ??? for i in df["ad_requests"]]

And this is where I get stuck. After the else, I want to "go back" and access the previous "row", though I know that this is not how pandas is meant to be used. Another thing to note that is the rows will always be grouped in threes, by column ad_tag_name. If I pd.groupby["ad_tag_name"], I can then turn this into a list and start slicing and indexing, but again, I think there must be a better way to do this in pandas (as there is many things).

Python: 2.7.10

Pandas: 0.18.0

Pyderman
  • 14,809
  • 13
  • 61
  • 106
  • 1
    Hint: `df.ad_requests.ffill() - df.impressions.cumsum().shift()` gets you partway there. – John Zwinck Nov 22 '16 at 04:17
  • Interesting. Using this approach, values are *slightly* off though for the first few blank rows, then ultimately become negative: http://imgur.com/a/k7faf – Pyderman Nov 22 '16 at 04:47
  • That's why it's a hint and not a full solution. The cumsum needs to be reset to 0 each time ad_requests is nonzero, or something like that. – John Zwinck Nov 22 '16 at 05:24

1 Answers1

3

You'll want to do something like this:

pd.options.mode.chained_assignment = None #suppresses "SettingWithCopyWarning"
for index, elem in enumerate(df['ad_requests']):
    if pd.isnull(elem):
        df['ad_requests'][index]=df['ad_requests'][index-1]-df['impressions'][index-1]

The warning comes from the fact that we're changing the values of a view of a dataframe, which affects the original dataframe. That is what we wish to do, however, so it doesn't really concern us.

(Python 2.7.12 and Pandas 0.19.0)

EDIT:

Changing the last line of code from

df['ad_requests'][index]=df['ad_requests'][index-1]-df['impressions'][index-1]

to

df.at[index,'ad_requests']=df.at[index-1,'ad_requests']-df.at[index-1,'impressions']

removes the need to suppress any warnings:

for index, elem in enumerate(df['ad_requests']):
    if pd.isnull(elem):
        df.at[index,'ad_requests']=df.at[index-1,'ad_requests']-df.at[index-1,'impressions']
Rojan
  • 159
  • 5
  • 12
  • 1
    I knew I needed to use the indexes of the previous elements somehow, but had forgotten about `enumerate()`. And I knew the final solution would be short and neat, as most pandas-based solutions are. Thank you for this elegant approach. – Pyderman Nov 23 '16 at 02:28