1

I can't for the life of me figure out why I need to add [:] to my series for this function to work.

Here I'm just showing you that my data is a pandas series

CODE

data

OUTPUT

0         Watch the progressive monkeys run out screamin...
1         @imispgh When you have Bill Gates shorting Tes...
2         Monkey pox as reported by Reuters \n\n“Gay, bi...
3         @PeteUK7 Hey Pete\nPeople are crazy 'Busy'\nWe...
4         @vancemurphy @pfizer @moderna_tx @US_FDA Well,...
                                ...                        
191351    For our local #Monkeypox response, starting to...
191352    Monkeypox Be Not Proud (7-22-22) https://t.co/...
191353    Two children have been diagnosed with monkeypo...
191354    2 children diagnosed with monkeypox in U.S. ht...
191355    US confirms first monkeypox cases in children ...
Name: text, Length: 191356, dtype: object

CODE

nltk.download('stopwords')
stpwrds = stopwords.words('english')

CODE

def clean_text(text):
  for i in text.index:
      text[i] = emoji.replace_emoji(text[i], replace = ' ')
  
  text = text.str.lower()
  text = text.str.replace('http\S+', '', regex=True) # remove urls
  text = text.str.replace('@[^\s]+', '', regex=True) # remove twitter handels
  text = text.str.replace('#[^\s]+', '', regex=True) # remove hashtags
  text = text.str.replace(r'\n', '', regex=True) # remove new line markers
  text = text.str.replace('[^a-zA-Z]', ' ', regex=True) # remove all non letters
  
  for i in text.index:
    nostopwords = [word for word in text[i].split() if word not in stpwrds]
    text[i] = ' '.join(nostopwords)
  
  return text

CODE

data = clean_text(data)

OUTPUT (except its not actually output because it goes on forever and never ends unless i cancel the cell

/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:3: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until

CODE But when i do this it actually works

data = clean_text(data[:])

0 Answers0