2

I have a pandas dataframe

     A  B  C
0  NaN  2  6
1  3.0  4  0
2  NaN  0  4
3  NaN  1  2

where I have a column A that has NaN values in some rows (not necessarily consecutive).

I want to replace these values not with a constant value (which pd.fillna does), but rather with the values from a numpy array.

So the desired outcome is:

     A  B  C
0  1.0  2  6
1  3.0  4  0
2  5.0  0  4
3  7.0  1  2

I'm not sure the .replace method will help here as well, since that seems to replace value <-> value via dictionary. Whereas here I want to sequentially change NaN to its corresponding value (by index) in the np array.

I tried:

MWE:

huh = pd.DataFrame([[np.nan, 2, 6],
                    [3, 4, 0],
                    [np.nan, 0, 4],
                    [np.nan, 1, 2]],
                   columns=list('ABC'))

huh.A[huh.A.isnull()] = np.array([1,5,7])  # what i want to do, but this gives error

gives the error

SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy '''

I read the docs but I can't understand how to do this with .loc. How do I do this properly, preferably without a for loop?

Other info:

  • The number of elements in the np array will always match the number of NaN in the dataframe, so your answer does not need to check for this.

2 Answers2

3

You are really close, need DataFrame.loc for avoid chained assignments:

huh.loc[huh.A.isnull(), 'A'] = np.array([1,5,7])
print (huh)
     A  B  C
0  1.0  2  6
1  3.0  4  0
2  5.0  0  4
3  7.0  1  2
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • what happens when there is mismatch in length i.2 `when array has more than 3 values ` or `huh['A'] has more NaN than assigning array` – Pyd Sep 10 '18 at 05:24
  • 1
    Here is is not problem, OP notice `The number of elements in the np array will always match the number of NaN in the dataframe, so your answer does not need to check for this.` – jezrael Sep 10 '18 at 05:25
  • yes, I get `ValueError: shape mismatch: value array of shape (4,) could not be broadcast to indexing result of shape (3,) ` when mismatch in lengths – Pyd Sep 10 '18 at 05:27
  • 1
    @pyd - I think need check lengths, need some time for solution. – jezrael Sep 10 '18 at 05:28
  • it's so weird, it works with my minmal example, but when I implement it in my actual code, I still get the SettingwithcopyWarning. (using `.loc`) Yet, the `nan` in the main 'huh' dataframe gets replaced regardless.. So, I get my desired result, but the warning still shows.. I'll need some time to debug this.. but thanks for the quick reply nonetheless. –  Sep 10 '18 at 05:50
  • @QuestionAsker - I think should be also different problem, check if not necesary [copy](https://stackoverflow.com/a/44966873/2901002) – jezrael Sep 10 '18 at 05:52
  • 1
    @jezrael You were right, I needed `.copy()`. Thanks! –  Sep 10 '18 at 13:19
1

zip

This should account for uneven lengths

m = huh.A.isna()
a = np.array([1, 5, 7])
s = pd.Series(dict(zip(huh.index[m], a)))

huh.fillna({'A': s})

     A  B  C
0  1.0  2  6
1  3.0  4  0
2  5.0  0  4
3  7.0  1  2
piRSquared
  • 285,575
  • 57
  • 475
  • 624