0

I'm dealing with incomplete data and would like to assign scoring to different rows.

For example:

Bluetooth and WLAN are non integers but I would like to assign the value of 1 if data is available. 0 if there is no data (or NaN).

Samsung's score would be 1 + 1 + 4 = 6 Nokia's score would be 0 + 0 + 5 = 5

Bluetooth   WLAN    Rating Score

Apple Class-A USB-A NaN Samsung Class-B USB-B 4 Nokia NaN NaN 5

enter image description here

I'm using Pandas at the moment but I'm not sure if Pandas alone is capable without Numpy.

Thanks a lot!

gyinshen
  • 29
  • 4
  • I was playing with this issue on the weekend, this might help? https://stackoverflow.com/questions/37543647/how-to-replace-all-non-nan-entries-of-a-dataframe-with-1-and-all-nan-with-0 – JonTout May 09 '22 at 11:11

2 Answers2

0

try this :

import pandas as pd
import numpy as np
df['Nan_count']=df.isnull().sum(axis=1)
df['score']=-df['Nan_count']+df['Rating'].replace(np.nan,0)+2

With this solution we do need to change the Nan in our dataframe et as computation is pretty low also

DataSciRookie
  • 798
  • 1
  • 3
  • 12
0
import pandas as pd
import numpy as np

data = {'Bluetooth': ['class-A', 'class-B', np.nan], 'WLAN': ['usb-A', 'usb-B', np.nan],'Rating': [np.nan, 4, 5]}
df = pd.DataFrame(data)

df = df.replace(np.nan, 0)

df = df.apply(lambda x: pd.to_numeric(x, errors='coerce')).fillna(1)


df['score'] = df.sum(axis=1)

print(df.head())

Output:

   Bluetooth  WLAN  Rating  score
0        1.0   1.0     0.0    2.0
1        1.0   1.0     4.0    6.0
2        0.0   0.0     5.0    5.0
Ali
  • 350
  • 3
  • 10