I have two dataframes, and I would Like to join df1 to df2 where df1 contains a url and df2 contains a list of urls.
The shape of df1 and df2 are different
Example:
df1 = pd.DataFrame({'url': ['http://www.example.jp/pro/sanada16']})
df2 = pd.DataFrame({'urls': ['[https://www.example.jp/pro/minoya, http://www.example.jp/pro/tokyo_kankan, http://www.example.jp/pro/briansawazakiphotography, http://www.example.jp/pro/r_masuda, http://www.example.jp/pro/sanada16, ......]']})
I would like the datafrmes to join on the condition that http://www.example.jp/pro/sanada16 in df1.url exists in df2.urls.
I thought about making the list in columns to columns, but the number of URLs is not unique in df2.urls.
I tried to add the df1.url substring that matched the df2.urls to a new column, so that I could join on the new column, but I couldn't get it to work:
df2['match'] = df2['urls'].apply(lambda x: x if x in df1['url'])
expected output:
new_df = pd.DataFrame({'url': ['http://www.example.jp/pro/sanada16'], 'urls': ['[https://www.example.jp/pro/minoya, http://www.example.jp/pro/tokyo_kankan, http://www.example.jp/pro/briansawazakiphotography, http://www.example.jp/pro/r_masuda, http://www.example.jp/pro/sanada16, ......]']})
With postgresql I could do:
SELECT
b.url
,a.urls
FROM df2 a
join df1 b
on position(b.url in a.urls)>0