I have two pandas dataframe,
Customer Name Dataframe: Name
ID Name
9967 Tasia Aldrich
5154 Tinisha Balcom
9780 Lonna Braga
5101 Violette Arneson
7074 Oralee Schertz
4324 Tempie Bristol
7270 Margarita Evens
2087 Jon Woodham
1301 Linwood Wollman
5897 Salvatore Oatis
and so on
Customer product purchase dataframe: Purchase
ID Product
1094 Audio Equipment
5337 Cameras and Camera Equipment
3035 Car Electronics and GPS
7865 Computer Accessories
2126 Desktop Computers and Monitors
8823 Laptops and Notebooks
4987 Lighting Equipment
7920 Smartphones and Mobile Devices
5801 Tablets and E-Readers
8242 Televisions and Television Services
and so on
Now both dataframe contains about 12 Million records, and I want to merge them based on their ID. I tried using traditional pandas operation:
mergeDF=name.merge(purchase,'left','ID')
and it took literally 140 minutes to fully merge. Is there any other method with which I can optimize this?
Second Part:
I am also trying to make a column in the newly merged dataframe called "key" which is defined as:
mergedDF['key']=mergedDF.apply(lambda x: x.ID+'&'+x.Product,axis=1)
which again take a ton of time. Any recommendation for this? I am new to python so any suggestions would be helpful.