python: merge two database based on matching multiple columns in both the datasets and apply a script on the result

Question

I have two databases with multiple column dataset-1(df1) has more than a couple of thousand rows and dataset-2(df2) is smaller... 300 rows.

I need to pickup a 'value' from column 3 in df2 based on matching 'fruit' in df1 with 'type' in df2 and 'expiry' in df1 with 'expiry' in df2

Furthermore, Instead of storing the 'Value' directly in a new column in df1, i need to perform a multiplication on the value in each row and the output gets to be stored in a new a column in df1.

So for example if expiry is 2 the value gets multiplied by 2 and if its 3 value gets multiplied by 3.. and so on and so forth!

I was able to solve this by using the code below, but.....:

for i in range(0, len(df1)):
df1_value = df2.loc[(df2['type'] == df1.iloc[i]['fruit']) & (df2['expiry'] == str(df1.iloc[i]['expiry'])].iloc[0]['value']
df1.loc[i, 'df_value'] = df1.iloc[i]['expiry']*df1_value

It creates two issues,

If the iteration throws up a null value (for example there is no 'value' for banana with expiry of 3 in df2), the process stops and it gives me an error -IndexError: single positional indexer is out-of-bounds
Because df1 has a very large number of rows, the individual iterations take a lot of time.

Is there a better way to handle this?

say df1:

fruit     expiry category
apple      3         a
apple      3         b
apple      4         c
apple      4         d
orange     2         a
orange     2         b
orange     3         c
orange     3         d
orange     3         e
banana     3         a
banana     3         b
banana     3         c
banana     4         d
pineapple  2         a
pineapple  3         b
pineapple  3         c
pineapple  4         d
pineapple  4         e

df2:

type        expiry  value
apple        2       100
apple        3       110
apple        4       120
orange       2       200
orange       3       210
orange       4       220
banana       2       310
banana       4       320
pineapple    2       410
pineapple    3       420
pineapple    4       430

output: -revised df1

fruit       expiry  category    df_value
apple       3        a           110*3=330
apple       3        b           110*3=330
apple       4        c           120*4=480
apple       4        d           120*4=480
orange      2        a           200...
orange      2        b           200...
orange      3        c           210...
orange      3        d           210...
orange      3        e           210...
banana      3        a           0  
banana      3        b           0
banana      3        c           0
banana      4        d           320*4=1280
pineapple   2        a           410*2=820
pineapple   3        b           420...
pineapple   3        c           420...
pineapple   4        d           430....
pineapple   4        e           430....

have you seen the pandas merge function? https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html — Vishesh Mangla, Aug 17 '20 at 16:35
also this Q&A, [Pandas Merging 101](https://stackoverflow.com/q/53645882/6692898) — RichieV, Aug 17 '20 at 16:46
Merging wont work based on checking info across two columns EACH on both datasets! — Akshay, Aug 18 '20 at 04:44

score 0 · Answer 1 · answered Sep 15 '22 at 09:11

As far as I know you can only do this by using SQL within python. SQL is used for relating different databases that have at least one column that is relatable (if you've used Power BI or Tableau you know what I mean) and querying multiple dataframes through their mutual relationships. I do not know this language so I cannot help you further than this.

python: merge two database based on matching multiple columns in both the datasets and apply a script on the result

1 Answers1