How to replace selected rows of pandas dataframe with a np array, sequentially?

Question

I have a pandas dataframe

     A  B  C
0  NaN  2  6
1  3.0  4  0
2  NaN  0  4
3  NaN  1  2

where I have a column A that has NaN values in some rows (not necessarily consecutive).

I want to replace these values not with a constant value (which pd.fillna does), but rather with the values from a numpy array.

So the desired outcome is:

     A  B  C
0  1.0  2  6
1  3.0  4  0
2  5.0  0  4
3  7.0  1  2

I'm not sure the .replace method will help here as well, since that seems to replace value <-> value via dictionary. Whereas here I want to sequentially change NaN to its corresponding value (by index) in the np array.

I tried:

MWE:

huh = pd.DataFrame([[np.nan, 2, 6],
                    [3, 4, 0],
                    [np.nan, 0, 4],
                    [np.nan, 1, 2]],
                   columns=list('ABC'))

huh.A[huh.A.isnull()] = np.array([1,5,7])  # what i want to do, but this gives error

gives the error

SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy '''

I read the docs but I can't understand how to do this with .loc. How do I do this properly, preferably without a for loop?

Other info:

The number of elements in the np array will always match the number of NaN in the dataframe, so your answer does not need to check for this.

score 3 · Accepted Answer · answered Sep 10 '18 at 05:15

3

You are really close, need DataFrame.loc for avoid chained assignments:

huh.loc[huh.A.isnull(), 'A'] = np.array([1,5,7])
print (huh)
     A  B  C
0  1.0  2  6
1  3.0  4  0
2  5.0  0  4
3  7.0  1  2

answered Sep 10 '18 at 05:15

jezrael

822,522
95
1,334
1,252

what happens when there is mismatch in length i.2 `when array has more than 3 values ` or `huh['A'] has more NaN than assigning array` – Pyd Sep 10 '18 at 05:24
1

Here is is not problem, OP notice `The number of elements in the np array will always match the number of NaN in the dataframe, so your answer does not need to check for this.` – jezrael Sep 10 '18 at 05:25
yes, I get `ValueError: shape mismatch: value array of shape (4,) could not be broadcast to indexing result of shape (3,) ` when mismatch in lengths – Pyd Sep 10 '18 at 05:27
1

@pyd - I think need check lengths, need some time for solution. – jezrael Sep 10 '18 at 05:28
it's so weird, it works with my minmal example, but when I implement it in my actual code, I still get the SettingwithcopyWarning. (using `.loc`) Yet, the `nan` in the main 'huh' dataframe gets replaced regardless.. So, I get my desired result, but the warning still shows.. I'll need some time to debug this.. but thanks for the quick reply nonetheless. – Sep 10 '18 at 05:50
@QuestionAsker - I think should be also different problem, check if not necesary [copy](https://stackoverflow.com/a/44966873/2901002) – jezrael Sep 10 '18 at 05:52
1

@jezrael You were right, I needed `.copy()`. Thanks! – Sep 10 '18 at 13:19

score 1 · Answer 2 · answered Sep 10 '18 at 05:33

1

`zip`

This should account for uneven lengths

m = huh.A.isna()
a = np.array([1, 5, 7])
s = pd.Series(dict(zip(huh.index[m], a)))

huh.fillna({'A': s})

     A  B  C
0  1.0  2  6
1  3.0  4  0
2  5.0  0  4
3  7.0  1  2

answered Sep 10 '18 at 05:33

piRSquared

285,575
57
475
624

How to replace selected rows of pandas dataframe with a np array, sequentially?

2 Answers2

zip

`zip`