0

I am new to pandas so please bear with me. I have a matrix that looks like this:

df=

     A  B  C  D  E  F  G  H  I  J  K
0    .  .  .  .  .  .  X  L  .  .  .
1    .  .  .  .  .  .  X  A  .  .  .
.
.
.
300  .  .  .  .  .  .  X  R  .  .  .
301  .  .  .  .  .  . nan R  .  .  .
302  .  .  .  .  .  .  X  R  .  .  .
303  .  .  .  .  .  . nan R  .  .  .
.
.
.

I am trying to change R to L if it's preceded by X and to N if preceded by nan (only these values can appear before R), this is what i expect as output :

     A  B  C  D  E  F  G  H  I  J  K
0    .  .  .  .  .  .  X  L  .  .  .
1    .  .  .  .  .  .  X  A  .  .  .
.
.
.
300  .  .  .  .  .  .  X  L  .  .  .
301  .  .  .  .  .  . nan N  .  .  .
302  .  .  .  .  .  .  X  L  .  .  .
303  .  .  .  .  .  . nan N  .  .  .
.
.
.

this is my code:

slice = df.loc[df['H'] == 'R']
for i in range(len(slice)):
    if isinstance([i,6],str):
        slice.iloc[i,7]= 'L'
    else:
        slice.iloc[i,7]= 'N'

The output of df is the exact same as before the operation. Printing out slice shows that the cells were updated but the changes didn't affect df. I checked the pandas documentation and searched for many similar situations and all sources say iloc/loc return a view and not a copy yet the values are not being changed in the original DataFrame. I spent hours looking for the answer but in vain. What am i missing ?

I know I can loop through the whole set but I like to avoid do that since the dataset is huge and I am sure there is a more efficient way to do it.

Domarius
  • 19
  • 7
  • https://stackoverflow.com/help/minimal-reproducible-example make minimal example and desired output by text. then we can solve your problem – Panda Kim Aug 17 '23 at 06:52

3 Answers3

1

If you know the column names, you can do it with a custom function and apply(), like this:

def replace_values(row):
    if row['G'] == 'X':
        return row['H'].replace('R', 'L')
    elif pd.isnull(row['G']):
        return row['H'].replace('R', 'N')
    else:
        return row['H']

And then just call it on your column to modify it:

df['H'] = df.apply(replace_values, axis=1)
vtasca
  • 1,660
  • 11
  • 17
  • Very neat and readable. Thank you. Was wondering what was actually wrong with my code since it reflects that I am missing some core concepts about copy/view in pandas with loc/iloc – Domarius Aug 17 '23 at 13:40
  • I am curious to know about whether using apply on every row or using loc(cons) and then using the indexes to loop over results is more efficient – Domarius Aug 17 '23 at 14:18
1

Using vectorial code:

# replace if 'H'=='R'        and G value is mapped X -> L, nan -> N
df.loc[df['H'].eq('R'), 'H'] = df['G'].replace({'X': 'L'}).fillna('N')

If you don't know the name of the preceding column:

df.loc[df['H'].eq('R'), 'H'] = df.shift(axis=1)['H'].replace({'X': 'L'}).fillna('N')
mozway
  • 194,879
  • 13
  • 39
  • 75
  • Super pythonic, took me a while to get the idea behind it. Can you please explain what I was doing wrong as i still don't get why looping through the result of loc is not changing the original dataframe. – Domarius Aug 17 '23 at 13:38
  • 1
    @Domarius your issue is a classical mistake of double slicing. By doing `df.loc[df['H'] == 'R'].iloc[i,7]` you create a new Series, which gets modified and is immediately discarded. Anyway, you should always using a loop to modify a DataFrame, pandas was not designed for that. – mozway Aug 17 '23 at 16:41
1

The other answers are probably more feasible, but let me show you what was wrong with your code:

slice = df.loc[df['H'] == 'R']
# This slice will be helpful to give us the index of matching values
slice_index = list(slice.index)

for i in slice_index: # Looping through the index marked by the slice
    if isinstance(df.iloc[i,6],str): # I assume [i,6] in question was a typo
        df.iloc[i,7]= 'L' # Modify the original df
    else:
        df.iloc[i,7]= 'N' # Modify the original

In your case, slice is indeed a copy, which can be confirmed by the warning:

SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame
Sam
  • 643
  • 2
  • 13
  • Although the top answers provide clean solutions to the problem. As you said I am still confused about why iloc/loc is sometimes working like a copy and sometimes like a view. Does loc always return a copy ? For example df.loc[df['H'] == 'R', 'G'] = 'V' would have changed the initial dataframe but using slice = df.loc[df['H'] == 'R'] and then looping through slice with iloc[i,7] doesn't. I feel like I am missing something fundamental. – Domarius Aug 17 '23 at 14:04
  • 1
    https://stackoverflow.com/questions/23296282/what-rules-does-pandas-use-to-generate-a-view-vs-a-copy – Sam Aug 18 '23 at 13:47
  • 1
    This might help! – Sam Aug 18 '23 at 13:48