Panda Dataframe - Add values to new column based on criteria of other columns

Question

I am trying to add values to a new dataframe (df2) column (Discount%), the values in this column must be based on "Grid" and Entity from df1), My structure is as following

so if for the same entity, the column in DF1 is 91-120 then it should add 20 to DF2 under Discount%, if the column in DF1 is 61-90 then it must add 5 to DF2 and so one.

The data is imported from a large csv file, so far I have tried below but if fill only with 0

for j in range(0,len(df1)):
for i in range(0,len(df2)):
if grid['91-120'][j] in df2['Grid'][i]:
#df['Grid%'][i] = grid['91-120'][j]
df2.loc[i, 'Grid%'] = df1['91-120'][j]

thank you

Can you please provide the code which can be used to reproduce the table and any of your tried approach? — EMT, Jun 15 '23 at 08:30

score 0 · Accepted Answer · answered Jun 15 '23 at 15:46

I'm currently dealing with a similar problem related to iterating over a dataframe. You really don't want to do that if it could be avoided, especially if the dataframe contains duplicate values like your DF1. I would recommend converting the reference dataframe, DF1, to a dictionary with the index orientation and then assigning the value to DF2 from that dictionary as shown below.

DF1 = pd.DataFrame({'Entity': ['F1', 'F2', 'F3', 'F4'], '0-60': [0, 0, 0, 0], '61-90': [0, 5, 10, 5], '91-120':[20, 5, 20, 20], '121-180':[10, 5, 12, 15], '181-240':[20, 5, 22, 25]})
DF2 = pd.DataFrame({'Entity': ['F1', 'F2', 'F3', 'F4'], 'Grid': ['360+', '61-90', '0-60', '91-120']})

print('DF2 before:')
print(DF2)

DF1.drop_duplicates(inplace=True)
DF1.set_index('Entity', inplace=True)
d = DF1.to_dict('index')

def get_discount(entity, grid):
    if entity in d and grid in d[entity]:
        return d[entity][grid]
    else:
        return None

DF2['Discount %'] = DF2.apply(lambda x: get_discount(x['Entity'], x['Grid']), axis=1)

print('DF2 after:')
print(DF2)

I found this solution because, as I mentioned before, I'm currently working on a similar problem.
Knowing how detrimental iteration over a dataframe can be to the performance of a function, I realized it would be faster to assign a value from a dictionary. I looked up how to convert a dataframe to a dictionary on Stack Overflow and in the pandas documentation. Next, I looked up how to assign a value to a dataframe from a dictionary on Stack Overflow. I was trying the "dict" orientation at first. I could get it to assign all discount values for each "Entity" based on "Grid", but I couldn't select the one right discount value.
I couldn't find any other solutions online for assigning a value to a dataframe from a 2D dictionary, so I turned to ChatGPT. After ChatGPT did its thing, I was getting "None" in every field. Eventually, I got it to recommend changing the orientation of the dictionary from "dict" to "series". That also didn't work, but I figured I would try all the other orientations. Index worked.
The downside is that Discount % values are floats. The upside is it can handle cases where you don't have a value for Grid in the dictionary (e.g. where Grid is "360+").

Thank you really much Leon, that worked and i can adjust the fields accordingly. Frankly, i also tried all the possible solutions, with series, array, tulples and even dictionary but nothing worked for this 2d relation never thought about ChatGpt though,. I am not really worried about NaN values, those i can deal with... Appreciate and thank you for your efforts — saq, Jun 15 '23 at 16:15

Panda Dataframe - Add values to new column based on criteria of other columns

1 Answers1