Pandas: find substring in string in some columns

Question

I have a 2 dataframe, in first - columns, where I should find some info second - column, what I should find in first dataframe and columns, what should I add if string from first column contain.

df1:

id    url  
111   vk.com/audio
222   twitter.com/chats

df2:

url   Maincategory   Subcategory
vk.com   Social Network    entertainment
twitter.com   Social Network   entertainment

If url column were match, I would use

df1['Main Category'] = df1.url.map(df2.set_index('url')['Maincategory'])

But it doesn't work to find substring. I use for that

mapping = dict(df2.set_index('url')['Maincategory'])
def map_to_substring(x):
    for key in mapping.keys():
        if key in x:
            return mapping[key]
    return 'None'

But if df is too much, it takes too much time. How can I improve this approach to do it faster?

If you're matching with the domain name, it could be worthwhile to add a column to your dataframe using `urlparse`. You could do exact matching on the `netloc`. Of course this won't work for arbitrary substrings, but it might work in your case. Reference: https://docs.python.org/2/library/urlparse.html — Mikk, Jan 19 '17 at 13:30
*Note*: There is a solution [described by @unutbu](https://stackoverflow.com/a/48600345/9209546) which is more efficient than using `pd.Series.str.contains`. If performance is an issue, then this may be worth investigating. — jpp, May 06 '18 at 22:18

score 0 · Answer 1 · answered Jan 19 '17 at 13:30

it is not clear what you are asking but you should use the pandas str.contains methods http://pandas.pydata.org/pandas-docs/stable/text.html

as a general rule, you can loop over each column in the first dataframe, and search for a match in the second one. there is no faster solution than this I think

Pandas: find substring in string in some columns

1 Answers1