1

I sliced a part of a dataframe to keep only two columns.

description_category = titles[['listed_in','description']]

the extract look like that

description_category.head()

    Listed_in                                           description
0   International TV Shows, TV Dramas, TV Sci-Fi &...   In a future where the elite inhabit an island ...
1   Dramas, International Movies                        After a devastating earthquake hits Mexico Cit...
2   Horror Movies, International Movies                 When an army recruit is found dead, his fellow...
3   Action & Adventure, Independent Movies, Sci-Fi...   In a postapocalyptic world, rag-doll robots hi...
4   Dramas                                              A brilliant group of students become card-coun...

What I want to do is to put in [,] each theme in the column "Listed_in", so it looks like that :

    listed_in                                           description
0   [International TV Shows, TV Dramas, TV Sci-Fi ...   In a future where the elite inhabit an island ...
1   [Dramas, International Movies]                      After a devastating earthquake hits Mexico Cit...
2   [Horror Movies, International Movies]               When an army recruit is found dead, his fellow...
3   [Action & Adventure, Independent Movies, Sci-F...   In a postapocalyptic world, rag-doll robots hi...
4   [Dramas]                                            A brilliant group of students become card-coun...

I tried this, but it showed me a warning :

description_category['listed_in'] = description_category['listed_in'].apply(lambda x: x.split(', '))

Warning :

C:\Anaconda\envs\nlp_course\lib\site-packages\ipykernel_launcher.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.

I check few threads on that issue, but I am still not able to fix it.

What do you suggest me to do?

Let me know if you need more background to my issue.

tdy
  • 36,675
  • 19
  • 86
  • 83
Etienne
  • 31
  • 1
  • 1
  • 5

2 Answers2

3

If you want to make a new dataframe while keeping titles, then

  • either slice with .loc[]:

    description_category = titles.loc[:, ['listed_in', 'description']]
    
  • or create a .copy():

    description_category = titles[['listed_in', 'description']].copy()
    

Also it's faster to use .str.split() instead of apply():

description_category['listed_in'] = description_category['listed_in'].str.split(', ')
tdy
  • 36,675
  • 19
  • 86
  • 83
-1

Try this.. it will work!! Wrap up the entire task with DataFrame.loc[ ]`

description_category.loc[description_category['listed_in'] = description_category['listed_in'].apply(lambda x: x.split(', '))]

It won't show any warning

aaossa
  • 3,763
  • 2
  • 21
  • 34