0

I have two pandas dataframe,

Customer Name Dataframe: Name

  ID       Name
 9967   Tasia Aldrich
 5154   Tinisha Balcom    
 9780   Lonna Braga
 5101   Violette Arneson
 7074   Oralee Schertz
 4324   Tempie Bristol
 7270   Margarita Evens
 2087   Jon Woodham
 1301   Linwood Wollman
 5897   Salvatore Oatis

and so on

Customer product purchase dataframe: Purchase

 ID     Product
1094    Audio Equipment
5337    Cameras and Camera Equipment
3035    Car Electronics and GPS
7865    Computer Accessories
2126    Desktop Computers and Monitors
8823    Laptops and Notebooks
4987    Lighting Equipment
7920    Smartphones and Mobile Devices
5801    Tablets and E-Readers
8242    Televisions and Television Services

and so on

Now both dataframe contains about 12 Million records, and I want to merge them based on their ID. I tried using traditional pandas operation:

mergeDF=name.merge(purchase,'left','ID')

and it took literally 140 minutes to fully merge. Is there any other method with which I can optimize this?

Second Part:

I am also trying to make a column in the newly merged dataframe called "key" which is defined as:

mergedDF['key']=mergedDF.apply(lambda x: x.ID+'&'+x.Product,axis=1)

which again take a ton of time. Any recommendation for this? I am new to python so any suggestions would be helpful.

Kshitij Yadav
  • 1,357
  • 1
  • 15
  • 35

0 Answers0