2

Suppose we have values in query column of a panda data frame which are tokenized using the split() function like

query[4] = "['rain', 'shower', 'head']".

Now I want to perform some operations on individual words. So, I converted it into list and iterated through it using for loop like like :

l=list(query[4])

for word in l : word=func(word)

But it is storing each alphabets on the list like - ['[', "'", 'r', 'a', 'i', 'n', "'", ',' and so on.

I have even tried to use join function i.e. - ''.join(word) and ''.join(l)

But still nothing is working for me. Can you suggest something here. Any help will be appreciated.

awesoon
  • 32,469
  • 11
  • 74
  • 99
Ishan
  • 996
  • 3
  • 13
  • 34
  • Possible duplicate of [Convert string representation of list to list in Python](http://stackoverflow.com/questions/1894269/convert-string-representation-of-list-to-list-in-python) – awesoon Aug 12 '16 at 05:16
  • `list` does not parse the given string, it simply creates the list of string symbols. If you want to parse `"['rain', 'shower', 'head']"` take a look at the link above. – awesoon Aug 12 '16 at 05:18
  • 1
    What is the command you ran on pandas DF to get this output? – Naveen Kumar Aug 12 '16 at 05:18

3 Answers3

1

If need works with pandas DataFrame, you need first convert string values to list with str.strip and str.split:

df = pd.DataFrame({'a':["[rain, shower, head]", "[rain1, shower1, head1]"]})
print (df)
                         a
0     [rain, shower, head]
1  [rain1, shower1, head1]

print (type(df.a.ix[0]))
<class 'str'>

df['a'] = df.a.str.strip('[]').str.split(',')
print (df)

                           a
0     [rain,  shower,  head]
1  [rain1,  shower1,  head1]

print (type(df.a.ix[0]))
<class 'list'>

Then you can apply custom function:

def func(x):
    return x + 'aaa'

def f(L):
    return [func(word) for word in L]

print (df.a.apply(f))    
0       [rainaaa,  showeraaa,  headaaa]
1    [rain1aaa,  shower1aaa,  head1aaa]
Name: a, dtype: object

def f(L):
    return [word + 'aaa' for word in L]

print (df.a.apply(f))    
0       [rainaaa,  showeraaa,  headaaa]
1    [rain1aaa,  shower1aaa,  head1aaa]
Name: a, dtype: object
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0

You are seeing the correct output. The line

query[4] = "['rain', 'shower', 'head']"

means that query[4] is of type string. To be treatable as array, it should be ['rain', 'shower', 'head'].

Check this output from python REPL with what you have:

>>> query = "['rain', 'shower', 'head']"
>>> list(query)
>>> ['[', "'", 'r', 'a', 'i', 'n', "'", ',', ' ', "'", 's', 'h', 'o', 'w', 'e', 'r', "'", ',', ' ', "'", 'h', 'e', 'a', 'd', "'", ']']

After changing the assignment to an array, here is the new output in REPL:

>>> query = ['rain', 'shower', 'head']
>>> list(query)
>>> ['rain', 'shower', 'head']
randominstanceOfLivingThing
  • 16,873
  • 13
  • 49
  • 72
  • Yes, I understand it now. But can you suggest something what should I do here to get the desired output ? – Ishan Aug 12 '16 at 05:17
0

You need to convert to string to actual list:

data = eval(query[4])

Then loop through the data:

for word in data: word = func(word)
acw1668
  • 40,144
  • 5
  • 22
  • 34