I have a csv files with a column name 'Body' with mix of normal character and UNICODE character. However, I am now trying to figuring out on how to detect it. For normal character, I've able to code as below;
df.loc[(df['UDH'].isnull()) & df['Body'].str.len().gt(156), 'Double'] = '1'
df.loc[(df['UDH'].notnull()) & (df['Body'].str.len().gt(153)), 'Double'] = '1'
Above is my current code where I've filtered based on multiple column and if exceed the number of character it will assign column 'Double' to 1 for a normal character.
When I tried with row consist UNICODE character, it didn't work. My codes with UNICODE as below;
df.loc[(df['UDH'].isnull()) & df['Body'].str.len().gt(66), 'Double'] = '1'
df.loc[(df['UDH'].notnull()) & (df['DCS']=='0') & (df['Body'].str.len().gt(63)), 'Double'] = '1'
Example some of UNICODE character, also contain different foreign language such as Mandarin, Tamil, Punjabi, Bulgarian
Body
è¯·å‹¿å°†æ‚¨çš„å–æ¬¾ä»£ç 1737958给他人
ਹੈਲੋ ਤੁਹਾਨੂੰ ਮਿਲ ਕੇ ਚੰਗਾ ਲੱਗਿਆ
Appreciate your suggestion on this and thank you in advance :)
EDIT:
For unicode character type;
df.loc[(df['UDH'].notnull()) & (df['DCS']=='0') & (df['Body'].astypes('UTF8').len().gt(66)), 'Double'] = '1'
gave me an error as below:
Traceback (most recent call last):
File "/Users/syafiq/opt/anaconda3/lib/python3.7/tkinter/__init__.py", line 1705, in __call__
return self.func(*args)
File "/Users/syafiq/Downloads/RoutingPractice01.py", line 47, in main
df.loc[(df['UDH'].notnull()) & (df['DCS']=='0') & (df['Body'].astypes('UTF8').len().gt(66)), 'Double'] = '1'
File "/Users/syafiq/opt/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py", line 5179, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'Series' object has no attribute 'astypes'