2

Consider the following example code

import pandas as pd
import numpy as np

pd.set_option('display.expand_frame_repr', False)
foo = pd.read_csv("foo2.csv", skipinitialspace=True, index_col='Index')
foo.loc[:, 'Date'] = pd.to_datetime(foo.Date)

for i in range(0, len(foo)-1):
    if foo.at[i, 'Type'] == 'Reservation':
        for j in range(i+1, len(foo)):
            if foo.at[j, 'Type'] == 'Payout':
                foo.at[j, 'Nights'] = foo.at[i, 'Nights']
                break

mask = (foo['Date'] >= '2018-03-31') & (foo['Date'] <= '2019-03-31')
foo2019 = foo.loc[mask]
foopayouts2019 = foo2019.loc[foo2019['Type'] == 'Payout']
foopayouts2019.loc[:, 'Nights'] = foopayouts2019['Nights'].apply(np.int64)
# foopayouts2019.loc[:, 'Nights'] = foopayouts2019['Nights'].astype(np.int64, copy=False)

with foo2.csv as:

Index,Date,Type,Nights,Amount,Payout
0,03/07/2018,Reservation,2.0,1000.00,
1,03/07/2018,Payout,,,1000.00
2,09/11/2018,Reservation,3.0,1500.00,
3,09/11/2018,Payout,,,1500.00
4,02/16/2019,Reservation,2.0,2000.00,
5,02/16/2019,Payout,,,2000.00
6,04/25/2019,Reservation,7.0,1200.00,
7,04/25/2019,Payout,,,1200.00

This gives the following warning:

/usr/lib/python2.7/dist-packages/pandas/core/indexing.py:543: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item] = s

The warning does not mention a line number, but appears to be coming from the line:

foopayouts2019.loc[:, 'Nights'] = foopayouts2019['Nights'].apply(np.int64)

At least, if I comment that line out, the error goes away. So, I have two questions.

  1. What is causing that error? I've been trying to use .loc where appropriate, including in that line where the warning is (possibly) coming from. If the problem is actually earlier, where is it?
  2. Second, which is the better choice, .apply or astype, as used in the following lines of code?

    foopayouts2019.loc[:, 'Nights'] = foopayouts2019['Nights'].apply(np.int64)
    # foopayouts2019.loc[:, 'Nights'] = foopayouts2019['Nights'].astype(np.int64, copy=False)
    

    It seems that both of them work, except for that warning.

Faheem Mitha
  • 6,096
  • 7
  • 48
  • 83

1 Answers1

1

I would change a few things in the code:

We are checking if the current row is Reservation and the next row is Payout by using shift() and ffill-ing the values where condition matches by using np.where()

foo.Date=pd.to_datetime(foo.Date) #convert to datetime
c=foo.Type.eq('Reservation')&foo.Type.shift(-1).eq('Payout')
foo.Nights=np.where(~c,foo.Nights.ffill(),foo.Nights) #replace if else with np.where

Or:

c=foo.Type.shift().eq('Reservation')&foo.Type.eq('Payout')
np.where(c,foo.Nights.ffill(),foo.Nights)

Then use series.between() to check if dates fall between 2 dates:

foo2019 = foo[foo.Date.between('2018-03-31','2019-03-31')].copy() #changes
foopayouts2019 = foo2019[foo2019['Type'] == 'Payout'].copy() #changes .copy()

Or directly:

foopayouts2019=foo[foo.Date.between('2018-03-31','2019-03-31')&foo.Type.eq('Payout')].copy()

foopayouts2019.loc[:, 'Nights'] = foopayouts2019['Nights'].apply(np.int64) #.astype(int)

   Index       Date    Type  Nights  Amount  Payout
3      3 2018-09-11  Payout       3     NaN  1500.0
5      5 2019-02-16  Payout       2     NaN  2000.0
anky
  • 74,114
  • 11
  • 41
  • 70
  • Thank you for the answer. Do the `foo2019` and `foopayouts2019` assignments not actually create copies then? And is that what is causing the error message that I'm seeing? Also, in the case of my second question, which is preferable, `.apply` or `.astype`? – Faheem Mitha Jul 30 '19 at 12:26
  • @FaheemMitha no, its creating a slice so any operation/mutation on them will give you that warning. read more [here](https://stackoverflow.com/questions/20625582/how-to-deal-with-settingwithcopywarning-in-pandas). Also , you can use `np.where()` or `np.select()` for implementing [conditional replace](https://stackoverflow.com/questions/19913659/pandas-conditional-creation-of-a-series-dataframe-column) . – anky Jul 30 '19 at 12:29
  • Thank you for the answer and the help. I'm still a bit unclear on the `c` condition and it's usage. Specifically, what does the usage of `-c` in `np.where(~c,...` mean here? I think your answer would benefit generally from some explanation. And not just in side comments. :-) – Faheem Mitha Aug 01 '19 at 13:23
  • @FaheemMitha Updated the answer with explanations, side note `~` is used to inverse a `True` to `False` and vice versa. I added another condition where you dont need the `~` . Hope it helps – anky Aug 01 '19 at 14:00