0

I have a dataframe with 2 columns: type and value, where some type usually has the same value in every row. In some rows, however, the value missing and we just have NaN. I want to fill in the appropriate value for each row (based on the row's type). I've made a sample dataframe and written code that does this and actually works. That being said, I'm new to pandas, and python in general, so I'm pretty sure it sucks. I was wondering if there is a more elegant way to do this, using fillna or similar functions. Here's the code that I have (A correlates to D, B to E, C to F, NaN to N):

import pandas as pd
import numpy as np

df = pd.DataFrame({"type": ["A", "B", "C", "A", "B", "C", "A", "B", "C", np.NaN, np.NaN, np.NaN],
                   "value": ["D", "E", "F", "D", "E", "F", np.NaN, np.NaN, np.NaN, np.NaN, "N", "N"]
                   })
print(df)


def valuemode(stype):
    if type(stype) == str:  # excluding NaN type
        y = df.loc[(df['type'] == stype)]
        # print(y)
    else:
        y = df.loc[(df['type'].isnull())]
        # print(y)
    mode = (y.mode())
    return mode.iloc[0]["value"]


for index, row in df.iterrows():
    rowtype = (row['type'])
    x = valuemode(row['type'])
    #print(row['value'])
    if pd.isnull(row['value']) == True:
        print("Type " + str(rowtype) + " will now have value " + str(x))
        row['value'] = x

for index, row in df.iterrows():
    print(row['type'], row['value'])
h1smajesty
  • 43
  • 6
  • It's hard to argue with code that works. Unless you're going to be processing millions of rows, there's not much point in worrying about it. `fillna` works if you have a fixed replacement, or if you have a lookup table that says what values to use. – Tim Roberts Mar 08 '21 at 01:51

1 Answers1

0

Based on this answer, for a better performance, you can use list comprehension instead of iterrows. The solution based on list comprehension

to_change = {'A':'D','B':'E','C':'F'}
def valuemode(x,y):
    if y is np.nan:
        if x is not np.nan:
            return x,to_change[x]
        else:
            return 'nan','N'
    return x,y
    
result = pd.DataFrame([valuemode(x,y) for x, y in zip(df['type'], df['value'])],columns=['type','value'])
meTchaikovsky
  • 7,478
  • 2
  • 15
  • 34