I have a dataframe which contain phrases and I want to extract only compound words separated by a hyphen from the dataframe and place them in another dataframe.
df=pd.DataFrame({'Phrases': ['Trail 1 Yellow-Green','Kim Jong-il was here', 'President Barack Obama', 'methyl-butane', 'Derp da-derp derp', 'Pok-e-mon'],})
So far here is what I got so far:
import pandas as pd
df=pd.DataFrame({'Phrases': ['Trail 1 Yellow-Green','Kim Jong-il was here', 'President Barack Obama', 'methyl-butane', 'Derp da-derp derp', 'Pok-e-mon'],})
new = df['Phrases'].str.extract("(?P<part1>.*?)-(?P<part2>.*)")
results
>>> new
part1 part2
0 Trail 1 Yellow Green
1 Kim Jong il was here
2 NaN NaN
3 methyl butane
4 Derp da derp derp
5 Pok e-mon
What I want is to have just the word so it would be(note that Pok-e-mon appears as Nan
due to 2 hyphens):
>>> new
part1 part2
0 Yellow Green
1 Jong il
2 NaN NaN
3 methyl butane
4 da derp
5 NaN NaN