1

I've got a large dataframe where each column have different start dates, as such I've used fillna to plug the holes in the dataframe with zeros instead of nan values. But when I'm trying to normalise the data both the mean and std dev function gives me nan values. I ran isna().values.any() to see if anything had been missed but it come back false. Tried changing it to be 0 ddof as well as I read someplaces that might help.

I also tried to add skipna=True to ensure it skips nan value if there are any, but once again no luck.

Lastly, I've also check that it's all float values and no issues there. Any idea what it could be cause I can't figure it out....

data.astype(float)
print(data.isna().values.any())
#df2.stack().dropna().mean()
std = data.stack().std(skipna=True, ddof=0)
print(std)
mean = data.stack().mean(skipna=True)
print(mean)
data = (data-mean)/std
print(data)

In the screenshot below, the two first False are from isna() queries. The two nan values are from mean and std and the dataframe below is clearly cause it's tried to normalise against these values. enter image description here

ZedIsDead
  • 105
  • 9
  • 1
    `data.astype(float)` doesn't assign that change to anything. So if they're numeric-like strings you're going to get NaN for all means and devs. You may just need `data = data.astype(float)` – ALollz Mar 29 '20 at 17:40
  • Thanks for the help, obvious now. Running it now I get the below error, although the line before is: data = data.fillna(0, inplace=True) 'NoneType' object has no attribute 'astype' – ZedIsDead Mar 30 '20 at 16:10
  • Have a look at https://stackoverflow.com/questions/43893457/understanding-inplace-true. When you do an `inplace` operation the return is `None`, which you then assign to `df`. You can either do one or the other, `data = data.fillna(0)` or `data.fillna(0, inplace=True)`. Personally, I think `inplace` is going to be deprecated at some point in the future so I'd stick with just assigning back without specifying `inplace=True` – ALollz Mar 30 '20 at 16:17
  • Thanks ALollz. Done that now and am sort of back to original problem, the astype(float) is working but still getting nan for the mean and std dev. did a print of the dataframe just before and there are no nan values or anything. What else could cause this? – ZedIsDead Mar 30 '20 at 16:37

1 Answers1

0

You should try to use lambda function and inside lambda try using np.mean(x) or np.std(x). I have faced this problem when I used pandas std property, pandas std returns 'nan' value if there is only a single float value in groupby object against an index. for example:

val = transactions['item_price'].groupby(transactions['item_id']).apply(lambda y: np.std(y))
val_counts = val[val ==0 ].value_counts()
val_counts
0.0    5926
Name: item_price, dtype: int64

The output shows the correct number 5926 of items with std = 0, conversely using pandas std() method:

val1 = transactions['item_price'].groupby(transactions['item_id']).std()
val1_counts = val1[val1 == 0].value_counts()
val1_counts
0.0    3555
Name: item_price, dtype: int64

which gives count 3555 which is wrong in my case. hope this helps someone.