-1

Hello Python Community

I am trying to process data from a pandas data frame that includes cell wrap as in the dataframe below.

Note that the last names are wrapping into the row below.

I tried iterating through the dataframe using:

for row in df.itertuples(index=True):

and updating the cell using:

df.Last[ii-1] = updateCell

and deleting the old row using:

df.drop([df.index[ii]],inplace=True)

But I encountered warnings like this: A value is trying to be set on a copy of a slice from a DataFrame

and further problems with indexes after the drop.

What is the best approach for this problem?

Barry

import numpy as np

# initialize list of lists 
data = [['Barney', 'Rubble', 25],
        ['Fred','Flintstone', 25], 
        ['Wilma','Slaghoople ',22],
        [ np.nan,'Flintstone', np.nan], 
        [ 'Betty', 'McBricker', 21],
        [ np.nan, 'Rubble', np.nan]]

# Create the pandas DataFrame 
df = pd.DataFrame(data, columns = ['First', 'Last', 'Age']) ```


Band
  • 1
  • 2
  • Please share all the warnings. – AMC Apr 05 '20 at 21:31
  • Does this answer your question? [How to deal with SettingWithCopyWarning in Pandas?](https://stackoverflow.com/questions/20625582/how-to-deal-with-settingwithcopywarning-in-pandas) – AMC Apr 05 '20 at 21:31

1 Answers1

0

df.Last[ii-1] = updateCell is using boolean indexing on the dataframe, which creates a copy of a slice (see for https://pandas.pydata.org/pandas-docs/version/0.25.0/user_guide/indexing.html#indexing-view-versus-copy for more details). To set the value directly, use df.loc[ii-1, 'Last'] = updateCell.

As an aside, looping through a dataframe and trying to delete rows inplace is probably not your best option. A starting point for using pandas is something like this . . .

cols = ['First', 'Last', 'Age']
df = pd.DataFrame(data, columns=cols)
for col in cols:
    prevcol = f'{col}_prev'
    df[prevcol] = f[col].shift(1)
    df[col] = df[col].fillna(df[prevcol])
>>>df
    First         Last   Age First_prev    Last_prev  Age_prev
0  Barney       Rubble  25.0        NaN          NaN       NaN
1    Fred   Flintstone  25.0     Barney       Rubble      25.0
2   Wilma  Slaghoople   22.0       Fred   Flintstone      25.0
3   Wilma    Flintsone  22.0      Wilma  Slaghoople       22.0
4   Betty    McBricker  21.0        NaN    Flintsone       NaN
5   Betty       Rubble  21.0      Betty    McBricker      21.0

If you aren't going to use vectorized operations, then I'd do the manipulation in the list of lists and then create a dataframe from the final product, if it is needed.

Eric Truett
  • 2,970
  • 1
  • 16
  • 21