Let's say my dataframe has column which is mixed with english and chinese words or characters, I would like to remove all the whitespaces between them if they're chinese words, otherwise if they're english, then keep one space only between words:
I have found a solution for removing extra spaces between english from here
import re
import pandas as pd
s = pd.Series(['V e r y calm', 'Keen and a n a l y t i c a l',
'R a s h and careless', 'Always joyful', '你 好', '黑 石 公 司', 'FAN STUD1O', 'beauty face 店 铺'])
Code:
regex = re.compile('(?<![a-zA-Z]{2})(?<=[a-zA-Z]{1}) +(?=[a-zA-Z] |.$)')
s.str.replace(regex, '')
Out:
Out[87]:
0 Very calm
1 Keen and analytical
2 Rash and careless
3 Always joyful
4 你 好
5 黑 石 公 司
dtype: object
But as you see, it works out for english but didn't remove spaces between chinese, how could get an expected result as follows:
Out[87]:
0 Very calm
1 Keen and analytical
2 Rash and careless
3 Always joyful
4 你好
5 黑石公司
dtype: object
Reference: Remove all spaces between Chinese words with regex