0

I have a problem when I used split() to a dataframe.
My python is python3.6.
My code is:

data = pd.DataFrame({'a':[1,2], 'b':['高 1', '中 2']})
print(data)
data['c'] = [x.split()[2] for x in data['b']]
# data['c'] = list(map(lambda x: x.split()[2], data['b']))

The error is:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-42-1d9b75c9fa9f> in <module>()
      1 data = pd.DataFrame({'a':[1,2], 'b':['高 1', '中 2']})
      2 print(data)
----> 3 data['c'] = [x.split()[2] for x in data['b']]

<ipython-input-42-1d9b75c9fa9f> in <listcomp>(.0)
      1 data = pd.DataFrame({'a':[1,2], 'b':['高 1', '中 2']})
      2 print(data)
----> 3 data['c'] = [x.split()[2] for x in data['b']]

IndexError: list index out of range

and I used data['c'] = list(map(lambda x: x.split()[2], data['b'])) , I got the same error IndexError: list index out of range
How to solve this? I want to get the number, Thanks.

###############################################################################

Thanks to @U9-Forward. I tried use it for my test and it worked.
But when I used to kaggle test. The floor attribute still return the same error. I can only get the value '高': df['floor'].apply(lambda x: x.split()[0]). I am confused! Is there any reason about codec? I used pd,read_csv(data,'gbk')

J.LOGAN
  • 29
  • 5

1 Answers1

3

Because python indexing starts from 0, so first element index is 0, second element index is 1, and so on..., so ya have to do:

data['c'] = [x.split()[1] for x in data['b']]

Also, more pandasic is like:

data['c'] = data['b'].apply(lambda x: x.split()[1])
U13-Forward
  • 69,221
  • 14
  • 89
  • 114