4

The following code transforms the table like this:

     col1  col2
0       1   3.0
1       2   4.0
2    C345   NaN
3  A56665   4.0
4   34553   NaN
5  353535   4.0

     col1   col2
0       1      3
1       2      4
2    C345   C345
3  A56665      4
4   34553  34553
5  353535      4

.

import pandas as pd

d = {'col1': [1, 2, "C345", "A56665", 34553, 353535], 'col2': [3, 4,None, 4,None, 4]}
df = pd.DataFrame(data=d)
df.col1.astype(str)

print(df)

df.col2.fillna(df.col1, inplace=True)
print(df)

However, I get a SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame when apllying this approach on large data sets. What am I doing wrong/ not in the intended way?

d4rty
  • 3,970
  • 5
  • 34
  • 73

1 Answers1

3

Try using loc like df.loc[:, 'col2'].fillna(df.col1, inplace=True)

To turn off SettingWithCopyWarning for a single dataframe, use

df.is_copy = False

Or,

df = df.copy()
meW
  • 3,832
  • 7
  • 27
  • still the same warning, for both cases: `df.loc[:, 'col2'].fillna(df.col1, inplace=True)`and `df.loc[:, 'col2'].fillna(df.loc[:,'col1'], inplace=True)` – d4rty Jan 17 '19 at 17:07
  • Does suppressing warning helps in your case? – meW Jan 17 '19 at 17:11
  • 2
    `df = df.copy()` works – d4rty Jan 17 '19 at 17:12
  • But why do I need to make a copy first? – d4rty Jan 17 '19 at 17:15
  • In Pandas, indexing a DataFrame returns a reference to the initial DataFrame. Thus, changing the subset will change the initial DataFrame. Thus, you'd want to use the copy if you want to make sure the initial DataFrame shouldn't change. – meW Jan 17 '19 at 17:17
  • But int my case, I want to alter the original DataFrame – d4rty Jan 17 '19 at 17:18
  • Yes, you do alter. But you're also indexing specific columns to choose the values from. Take a look here for elaborated description https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.copy.html – meW Jan 17 '19 at 17:21