I have two databases with multiple column dataset-1(df1) has more than a couple of thousand rows and dataset-2(df2) is smaller... 300 rows.
I need to pickup a 'value'
from column 3 in df2 based on matching 'fruit'
in df1 with 'type'
in df2 and 'expiry'
in df1 with 'expiry'
in df2
Furthermore, Instead of storing the 'Value'
directly in a new column in df1, i need to perform a multiplication on the value in each row and the output gets to be stored in a new a column in df1.
So for example if expiry is 2 the value gets multiplied by 2 and if its 3 value gets multiplied by 3.. and so on and so forth!
I was able to solve this by using the code below, but.....:
for i in range(0, len(df1)):
df1_value = df2.loc[(df2['type'] == df1.iloc[i]['fruit']) & (df2['expiry'] == str(df1.iloc[i]['expiry'])].iloc[0]['value']
df1.loc[i, 'df_value'] = df1.iloc[i]['expiry']*df1_value
It creates two issues,
- If the iteration throws up a null value (for example there is no 'value' for banana with expiry of 3 in df2), the process stops and it gives me an error -
IndexError: single positional indexer is out-of-bounds
- Because df1 has a very large number of rows, the individual iterations take a lot of time.
Is there a better way to handle this?
say df1:
fruit expiry category
apple 3 a
apple 3 b
apple 4 c
apple 4 d
orange 2 a
orange 2 b
orange 3 c
orange 3 d
orange 3 e
banana 3 a
banana 3 b
banana 3 c
banana 4 d
pineapple 2 a
pineapple 3 b
pineapple 3 c
pineapple 4 d
pineapple 4 e
df2:
type expiry value
apple 2 100
apple 3 110
apple 4 120
orange 2 200
orange 3 210
orange 4 220
banana 2 310
banana 4 320
pineapple 2 410
pineapple 3 420
pineapple 4 430
output: -revised df1
fruit expiry category df_value
apple 3 a 110*3=330
apple 3 b 110*3=330
apple 4 c 120*4=480
apple 4 d 120*4=480
orange 2 a 200...
orange 2 b 200...
orange 3 c 210...
orange 3 d 210...
orange 3 e 210...
banana 3 a 0
banana 3 b 0
banana 3 c 0
banana 4 d 320*4=1280
pineapple 2 a 410*2=820
pineapple 3 b 420...
pineapple 3 c 420...
pineapple 4 d 430....
pineapple 4 e 430....