0

My main issue is a memory error each time I try to merge two of my data frames like this:

result = df1.merge(df2[['col1','col2','col3']], on=['col1','col2'], how='left')

So I need another way to add col3 to df1 (without getting a memory error).

I found solutions using map(). But the examples always had one column as key for a mapping:

result['col3'] = df1['col1'].map(df2.set_index('col1')['col3'])

but as mentioned before, the combination of two columns identifies a row within my data frame.

My questions:

  1. Could map be a solution for my problem?
  2. How can I use the function map() and consider col1 and col2?
MaMo
  • 569
  • 1
  • 10
  • 27
  • There are some methods here: https://stackoverflow.com/a/53215754/3279716 – Alex Nov 08 '18 at 21:55
  • @Alex - not really. My point is I have two columns per data frame as key (col1 and col2), not one. The post you suggested has one column as key per dataframe with different name. – MaMo Nov 08 '18 at 22:02
  • To map, you'll need to turn `col1` and `col2` into a tuple, and use those same tuples as keys for your dictionary. – ALollz Nov 08 '18 at 22:32
  • Something like: `df1['tup'] = [tuple(x) for x in df1[['col1', 'col2']].values]`, and your dictionary as `dict((tuple(x[0:2]), x[2]) for x in df2[['col1', 'col2', 'col3']].values)` – ALollz Nov 08 '18 at 22:37

0 Answers0