0
  1. I have two dataframes, main df and index df.

  2. The target I want to do is let the column product of main df can use 'contain' function in index df to filter key word.

  3. In the end, the main df can have new a column keyword to show main_df[keyword]=[C2,VA,E220F,7350M].

main df is

        data    num           product
 0  2019-10-01  39013000    xxxxxC2xxxxxxx
 1  2019-10-01  39013000    xxxxxxVAxxxxxxxxxxxx
 2  2019-10-28  39013000    xxxxxxxxE220Fxxxxxxxxxxxxx
 3  2019-12-31  39013000    xxxxxxxx7350Mxxxxxxxx

index df is

    product
0   VA
1   C2
2   7350M
3   E220F

My code is:

for key in key_word:
    mask = df_import_tmp0["product"].str.contains(key) 
df_import_tmp0['keyword']=key

The output is not what I want:

df_import_tmp0

    data         num            product                 keyword
0   2019-10-01  39013000    xxxxxC2xxxxxxx              E220F
1   2019-10-01  39013000    xxxxxxVAxxxxxxxxxxxx        E220F
2   2019-10-28  39013000    xxxxxxxxE220Fxxxxxxxxxxxxx  E220F
3   2019-12-31  39013000    xxxxxxxx7350Mxxxxxxxx       E220F
Daniel Walker
  • 6,380
  • 5
  • 22
  • 45
Tony Lin
  • 3
  • 3
  • what is `key_word`? what is your desired output? – MoRe Jul 27 '22 at 01:01
  • Does this answer your question? [Python Pandas - Merge based on substring in string](https://stackoverflow.com/questions/48743662/python-pandas-merge-based-on-substring-in-string) – fsimonjetz Jul 27 '22 at 01:15

2 Answers2

1

Here's a way to do what you're asking:

df['keyword'] = df['product'].str.extract('(' + '|'.join(idx['product'].tolist()) + ')')

Input:

df:

         data       num                     product
0  2019-10-01  39013000              xxxxxC2xxxxxxx
1  2019-10-01  39013000        xxxxxxVAxxxxxxxxxxxx
2  2019-10-28  39013000  xxxxxxxxE220Fxxxxxxxxxxxxx
3  2019-12-31  39013000       xxxxxxxx7350Mxxxxxxxx

idx:

  product
0      VA
1      C2
2   7350M
3   E220F

Output:

         data       num                     product keyword
0  2019-10-01  39013000              xxxxxC2xxxxxxx      C2
1  2019-10-01  39013000        xxxxxxVAxxxxxxxxxxxx      VA
2  2019-10-28  39013000  xxxxxxxxE220Fxxxxxxxxxxxxx   E220F
3  2019-12-31  39013000       xxxxxxxx7350Mxxxxxxxx   7350M
constantstranger
  • 9,176
  • 2
  • 5
  • 19
0

A simple way:

import numpy as np    
df = pd.DataFrame({"product": product_list})
    key_words = ['VA','C2','7350M','E220F']
    df["keyword"] = ''
    for kw in key_words:
        df['keyword'][df["product"].replace(np.nan,'').str.contains(kw)] = kw

Gives:

    product                     keyword
0   xxxxxC2xxxxxxx              C2
1   xxxxxxVAxxxxxxxxxxxx        VA
2   xxxxxxxxE220Fxxxxxxxxxxxxx  E220F
3   xxxxxxxx7350Mxxxxxxxx       7350M
Chris Seeling
  • 606
  • 4
  • 11