-2

I would need to extract the following words from a dataframe.

car+ferrari

The dataset is

                   Owner        Sold
type
car+ferrari         J.G         £500000
car+ferrari         R.R.T.      £276,550 
car+ferrari        
motobike+ducati
motobike+ducati
...

I need to create a list with words from type, but distinguishing them separately. So in this case I need only car and ferrari.

The list should be

my_list=['car','ferrari']

no duplicates. So what I should do is select type car+ferrari and extract the all the words, adding them into a list as shown above, without duplicates (I have many car+ferrari rows, but since I need to create a list with the terms, I need only extract these terms once).

Any help will be appreciated

EDIT: type column is an index

  • Please provide a [mcve]. Have you tried anything, done any research? – AMC Jun 28 '20 at 20:03
  • Does this answer your question? [How to split a column into two columns?](https://stackoverflow.com/questions/14745022/how-to-split-a-column-into-two-columns) – AMC Jun 28 '20 at 20:04
  • The way you describe the task I am wondering why you cannot just type in `my_list = ['car', 'ferrari']`. – timgeb Jun 28 '20 at 20:04
  • Type is an index column. I cannot type in my_list as I would like to create a build-in function to do this in case of many types –  Jun 28 '20 at 22:20
  • _Type is an index column._ Then convert it to a Series, right? – AMC Jun 28 '20 at 22:35

1 Answers1

0
def lister(x): #function to split by '+'
    return set(x.split('+'))
    
df['listcol']=df['type'].apply(lister) # applying the function on the type column and saving output to new column 

Adding @AMC's suggestion of a rather inbuilt solution to split series in pandas:

df['type'].str.split(pat='+')

for details refer pandas.Series.str.split

Converting pandas index to series:

pd.Series(df.index)

Apply a function on index:

pd.Series(df.index).apply(lister)

or

pd.Series(df.index).str.split(pat = '+')

or

df.index.to_series().str.split("+")
  • https://stackoverflow.com/questions/14745022/how-to-split-a-column-into-two-columns – AMC Jun 28 '20 at 20:04
  • OP needs the list output from the delimited column values. It has not been mentioned that there is a requirement to split the column into two different columns! – JALO - JusAnotherLivngOrganism Jun 28 '20 at 20:06
  • In which case they would set `expand=False` when calling [`pandas.Series.str.split`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.split.html#pandas-series-str-split). – AMC Jun 28 '20 at 20:08
  • I suggest you to add your opinion of solution as an answer. There are multiple ways to approach a solution. Everything is appreciated here. A not so good solution against a good solution only tells the community as to why one is better than the other. Cheers – JALO - JusAnotherLivngOrganism Jun 28 '20 at 20:11
  • @AMC that question does not answer my question as type is an index –  Jun 28 '20 at 22:21
  • @JALO-JusAnotherLivngOrganism, unfortunately I need to consider a split for Ann index column –  Jun 28 '20 at 22:23
  • 2
    @Val _that question does not answer my question as type is an index_ It absolutely does, all you need to do is convert the index to a Series. – AMC Jun 28 '20 at 22:34