1

I have two dataframes one with a column filled with values of which some values are a product_ids and other values are other information which I have to keep. I have another dataframe with product_id and additional information on these products.

Now I'd like to merge the two dataframes on the product_id and in cases where I don't have a product id i'd like to just fill it up with NaN's. So I basically want to enrich one dataframe with data from the other dataframe. My product ids are strings and I can't change them to ints since the rest of the values in the column need to be strings.

I have tried several things. I have tried to write a function which checked whether the value was a digit and if so, would get the information from the other dataframe. Below the code and a rough sketch of what the data looked like.

def get_additional_info(case_table, product_info):
    for page in case_table['page_name']:
        if re.match('\d{6,}', page):
           return product_info[product_info['Key']==page]

page_name  timestamp some_columm some_other_column
202020340    
200304020
text 
202503050
3045060
text2 


key         info_on_product 
202020340   
200304020
202503050
3045060

It however only returned an empty dataframe. When I tested it with specific product ids (in this case called page_name) I did get results, it just didn't seem to work in the function.

I have also tried a similar method but then with the apply approach. It however didn't work because I couldn't figure out how to give it two arguments. I have also tried an approach using pandasql, which also didn't seem to work.

Noelle
  • 21
  • 5
  • Your question would benefit from an [example of your data](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). – Wouter Jan 27 '21 at 16:28

0 Answers0