-1

I have a dataFrame with 12 columns and one of those columns has multiple values separated by commas, for example:

enter image description here

I need to create a list that contains each word separated by commas in the column tags (non duplicated values).

I tried with this code:

data['tags'].to_list()

But it gave me a list of strings that represent each row in the tags column, like this:

output: ['twitter, literature','muslim, weather, donald trump, conservative, twitter, conspiracy theory, tornado, elementary school, tornado drill','lgbtq, twitter, gay culture, relationships,...]

What I really need is, for example:

[twitter, literature,muslim, weather, donald trump, conservative, conspiracy theory, tornado, elementary school, tornado drill,lgbtq, gay culture, relationships,...]

Do you have a better idea? :) Many thanks in advance!!

  • Always provide a [mre], with **code, data, errors, current output, and expected output, as text**, not screenshots, because [SO Discourages Screenshots](https://meta.stackoverflow.com/questions/303812/). It is likely the question will be down-voted and closed. You are discouraging assistance because no one wants to retype your data or code, and screenshots are often illegible. [edit] the question and **add text**. – Trenton McKinney Sep 15 '20 at 21:38
  • Please see [How to provide a reproducible copy of your DataFrame using `df.head(30).to_clipboard(sep=',')`](https://stackoverflow.com/questions/52413246) – Trenton McKinney Sep 15 '20 at 21:40
  • @AChampion, understand. Although, is there any method to delete those quotation signs? – sarandonga2912 Sep 15 '20 at 21:41
  • No. A string always has quotes. When a string is printed by itself, there are no quotes. When presented within a list, there will be quotes. – Trenton McKinney Sep 15 '20 at 21:42

1 Answers1

1

You can use the str accessor of pd.Series and then the split method.

>>> df = pd.DataFrame({'a': ['abc, def, ghi', 'jkl,mno'], 'b': [1, 2]})
>>> df
               a  b
0  abc, def, ghi  1
1        jkl,mno  2

>>> df.a.str.split(',')
0    [abc,  def,  ghi]
1           [jkl, mno]
Name: a, dtype: object
ApplePie
  • 8,814
  • 5
  • 39
  • 60
  • Thanks for your time, this result will give me multiple lists if I try to put them together I get the same problem as the quotation sing but this time will be the brackets instead. – sarandonga2912 Sep 15 '20 at 21:47