I have a problem when I used split() to a dataframe.
My python is python3.6.
My code is:
data = pd.DataFrame({'a':[1,2], 'b':['高 1', '中 2']})
print(data)
data['c'] = [x.split()[2] for x in data['b']]
# data['c'] = list(map(lambda x: x.split()[2], data['b']))
The error is:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-42-1d9b75c9fa9f> in <module>()
1 data = pd.DataFrame({'a':[1,2], 'b':['高 1', '中 2']})
2 print(data)
----> 3 data['c'] = [x.split()[2] for x in data['b']]
<ipython-input-42-1d9b75c9fa9f> in <listcomp>(.0)
1 data = pd.DataFrame({'a':[1,2], 'b':['高 1', '中 2']})
2 print(data)
----> 3 data['c'] = [x.split()[2] for x in data['b']]
IndexError: list index out of range
and I used data['c'] = list(map(lambda x: x.split()[2], data['b']))
, I got the same error IndexError: list index out of range
How to solve this? I want to get the number, Thanks.
###############################################################################
Thanks to @U9-Forward. I tried use it for my test and it worked.
But when I used to kaggle test. The floor attribute still return the same error. I can only get the value '高': df['floor'].apply(lambda x: x.split()[0])
. I am confused! Is there any reason about codec? I used pd,read_csv(data,'gbk')