0

temp data

brandmap data

I have two dataframes that I want to merge using the 'PH' Column in brandmap data and 'product_hierarchy' in the temp data. The 'PH' column have substrings of length (4,7,11,and 15) of the strings in 'product_hierarchy'. How do I merge these two data frames using these columns and a substring match?

vraka0723
  • 5
  • 2
  • 3
    Welcome to SO! Please take a moment to read about how to post pandas questions: http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples – YOLO Jan 22 '20 at 16:06

1 Answers1

0

A little bit complicated but it will work in combination with str.extract

import pandas as pd

df_ref = pd.DataFrame({"PH":["XXST", "XX7T"], "ValA": [1,2], "ValB": ["foo","bar"]})
df = pd.DataFrame({"product_hierarchy":["XXSTSDASD", "XX7TDSADASDASD", "XXSTHD", "XX7TDFDF"], 
                   "Val":["foo", "bar", "baz", "bar"]})

str_match = "({})".format("|".join(df_ref.PH))

df.merge(df_ref, left_on=df.product_hierarchy.str.extract(str_match)[0], right_on="PH")

Output:

    product_hierarchy   Val     PH   ValA   ValB
0   XXSTSDASD           foo     XXST    1   foo
1   XXSTHD              baz     XXST    1   foo
2   XX7TDSADASDASD      bar     XX7T    2   bar
3   XX7TDFDF            bar     XX7T    2   bar
Fourier
  • 2,795
  • 3
  • 25
  • 39
  • I have more than 62,000 rows in both dataframes so when I run this similar code I get a memory error. Any idea of how to fix that? – vraka0723 Jan 22 '20 at 19:56
  • @vraka0723 You can do it in chunks,https://stackoverflow.com/questions/25962114/how-to-read-a-6-gb-csv-file-with-pandas – Fourier Jan 23 '20 at 08:39