0

Basically I have two Pandas dataframes A and B as follows. What would be the easiest / computationally fastest way to combine A and B to get C? I'm currently using a for loop to iterate bag_name in dfB row-by-row but this could be slow for large dataframes.

I suspect there must be some built-in methods in Pandas to combine these dataframes - I apologise if this had been asked somewhere else - I don't know what keyword to search for.

DataFrame A (dfA)
| bag_name | ID   | price |
|----------|------|-------|
| a        | asdf | 1     |
| b        | qwer | 2     |
| c        | zxcv | 3     |


DataFrame B (dfB)
| bag_name | item_name | weight |
|----------|-----------|--------|
| a        | t         | 2.3    |
| b        | y         | 2.4    |
| b        | u         | 2.5    |
| c        | i         | 2.6    |
| c        | o         | 2.7    |
| c        | p         | 2.7    |

DataFrame C (dfC)
| bag_name | ID   | Price | item_name | weight |
|----------|------|-------|-----------|--------|
| a        | asdf | 1     | t         | 2.3    |
| b        | qwer | 2     | y         | 2.4    |
| b        | qwer | 2     | u         | 2.5    |
| c        | zxcv | 3     | i         | 2.6    |
| c        | zxcv | 3     | o         | 2.7    |
| c        | zxcv | 3     | p         | 2.7    |
matohak
  • 535
  • 4
  • 19
  • Does this answer your question? [Flat file databases](https://stackoverflow.com/questions/85/flat-file-databases) – Umar.H Dec 04 '19 at 15:48

2 Answers2

4

You should use merge:

dfC = dfA.merge(dfB, on='bag_name')

Another alternative would be join

dfC = dfA.join(dfB.set_index('bag_name'), on='bag_name')
Mr_and_Mrs_D
  • 32,208
  • 39
  • 178
  • 361
2

What you are looking for is the left join. Here is the example of it:

df1 = pd.DataFrame([["a", "asdf", 1], ["b", "qwer", 2], ["c", "zxcv", 3]], columns=["bag_name", "ID", "price"])
df2 = pd.DataFrame([["a", "t", 2.3], ["b", "y", 2.4], ["b", "u", 2.5], ["c", "i", 2.6], ["c", "i", 2.7], ["c", "o", 2.7]], columns=["bag_name", "item_name", "weight"])

df3 = pd.merge(df1, df2, how="left", on="bag_name")

It gives you the following results:

  bag_name    ID  price item_name  weight
0        a  asdf      1         t     2.3
1        b  qwer      2         y     2.4
2        b  qwer      2         u     2.5
3        c  zxcv      3         i     2.6
4        c  zxcv      3         i     2.7
5        c  zxcv      3         o     2.7