I have two dataframes that I want to merge using the 'PH' Column in brandmap data and 'product_hierarchy' in the temp data. The 'PH' column have substrings of length (4,7,11,and 15) of the strings in 'product_hierarchy'. How do I merge these two data frames using these columns and a substring match?
Asked
Active
Viewed 983 times
1 Answers
0
A little bit complicated but it will work in combination with str.extract
import pandas as pd
df_ref = pd.DataFrame({"PH":["XXST", "XX7T"], "ValA": [1,2], "ValB": ["foo","bar"]})
df = pd.DataFrame({"product_hierarchy":["XXSTSDASD", "XX7TDSADASDASD", "XXSTHD", "XX7TDFDF"],
"Val":["foo", "bar", "baz", "bar"]})
str_match = "({})".format("|".join(df_ref.PH))
df.merge(df_ref, left_on=df.product_hierarchy.str.extract(str_match)[0], right_on="PH")
Output:
product_hierarchy Val PH ValA ValB
0 XXSTSDASD foo XXST 1 foo
1 XXSTHD baz XXST 1 foo
2 XX7TDSADASDASD bar XX7T 2 bar
3 XX7TDFDF bar XX7T 2 bar

Fourier
- 2,795
- 3
- 25
- 39
-
I have more than 62,000 rows in both dataframes so when I run this similar code I get a memory error. Any idea of how to fix that? – vraka0723 Jan 22 '20 at 19:56
-
@vraka0723 You can do it in chunks,https://stackoverflow.com/questions/25962114/how-to-read-a-6-gb-csv-file-with-pandas – Fourier Jan 23 '20 at 08:39