0

I am trying use a NLTK function convert Text data into numerical form for SKlearn.The data that I am using Basically short txt data .

Input

NO 6 JALAN ASTAKA U8/82  SEKSYEN U8  BUKIT JELUTONG
MST GOLF PLAZA  NO 8  JALAN SS13/5

Expected Output

no  jalan astaka u seksyen u  bukit jelutong
    mst golf plaza  no   jalan ss

My code

user_defined_stop_words = ['kwun','tong']
i = nltk.corpus.stopwords.words('english')
j = list(string.punctuation) + user_defined_stop_words
newstopwords = set(i).union(j)

def preprocess(x):
    x = re.sub('[^a-z\s]', '', x.lower())                  # get rid of noise
    x = [w for w in x.split() if w not in set(newstopwords)]  # remove stopwords
    return ' '.join(x)

data['Clean_addr'] = data['Adj_Addr'].apply(preprocess)

Error

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\series.py in apply(self, func, convert_dtype, args, **kwds)
   2353             else:
   2354                 values = self.asobject
-> 2355                 mapped = lib.map_infer(values, f, convert=convert_dtype)
   2356 
   2357         if len(mapped) and isinstance(mapped[0], Series):

pandas/_libs/src\inference.pyx in pandas._libs.lib.map_infer()

<ipython-input-55-3e3b1d8472ed> in preprocess(x)
      5 
      6 def preprocess(x):
----> 7     x = re.sub('[^a-z\s]', '', x.lower())                  # get rid of noise
      8     x = [w for w in x.split() if w not in set(newstopwords)]  # remove stopwords
      9     return ' '.join(x)

AttributeError: 'float' object has no attribute 'lower'

How to fix this.

Rahul rajan
  • 1,186
  • 4
  • 18
  • 32

0 Answers0