0

I am trying to go through a list of comments collected on a pandas dataframe and tokenize those words and put those words in a new column in the dataframe but I have having an error running through this, is

The error is stating that AttributeError: 'unicode' object has no attribute 'apwords'

Is there any other way to do this? Thanks

def apwords(words):
    filtered_sentence = []
    words = word_tokenize(words)
    for w in words:
        filtered_sentence.append(w)
    return filtered_sentence
addwords = lambda x: x.apwords()
df['words'] = df['complaint'].apply(addwords)
print df
user3655574
  • 692
  • 2
  • 9
  • 27

2 Answers2

1

Your way to apply the lambda function is correct, it is the way you define addwords that doesn't work.

When you define apwords you define a function not an attribute therefore when you want to apply it, use:

addwords = lambda x: apwords(x)

And not:

addwords = lambda x: x.apwords()

If you want to use apwords as an attribute, you would need to define a class that inheritates from string and define apwords as an attribute in this class.

It is far easier to stay with the function:

def apwords(words):
    filtered_sentence = []
    words = word_tokenize(words)
    for w in words:
        filtered_sentence.append(w)
    return filtered_sentence
addwords = lambda x: apwords(x)
df['words'] = df['complaint'].apply(addwords)
ysearka
  • 3,805
  • 5
  • 20
  • 41
  • I tried doing what you and João Almeida suggested but I am getting a TypeError: expected string or buffer now, is that because like what you said I have to define a class that inherits from a string and do my original method? Thanks – user3655574 Jun 30 '16 at 13:46
  • No, it must mean that in your `df['complaints']` you have something else than strings. if you use `df.dtypes` you must have `object` type in front of `complaints` don't you? I think, the most likely is you have missing values (which aren't strings), then before applying `addwords` type `df['complaints'] = df['complaints'].fillna('')` to replace `nan` values by empty strings. – ysearka Jun 30 '16 at 13:56
  • @ysearka , would you be able to twist this code to pull a sentence that contain a specific word? – Ian_De_Oliveira Jul 26 '18 at 07:28
  • What do you mean by that? Could you describe the input you have and output you desire? That would make it far easier to understand and answer. – ysearka Jul 26 '18 at 08:17
0

Don't you just want to do this:

   df['words'] = df['complaint'].apply(apwords)

you don't need to define the function addwords. Which should be defined as:

addwords = lambda x: apwords(x)
João Almeida
  • 4,487
  • 2
  • 19
  • 35