Split a string that includes Korean characters

Question

I have a string containing Korean characters:

s = '굿모닝, today is 촉촉'

I want to split it as:

t = ['굿모닝', 'today', 'is', '촉촉']

Note that all the Korean characters are put together instead of separated, that is, it is '굿모닝', not '굿', '모', '닝'.

Questions:

How do I split that string to get the required output?
Do I need to use a regular expression?

What you want can be achieved by `s.split()`. Can you describe a more complex example or how you want to split by regex? — umutto, Jan 04 '18 at 04:13
Sorry that I am not familiar with regular expression. I searched the web that I may use re.findall and somethings like [\u3131-\ucb4c], but I don't know to do that exactly. — Chan, Jan 04 '18 at 04:25

score 4 · Accepted Answer · answered Jan 04 '18 at 04:14

4

I don't think Korean has any relevance here... The only issue I can think of is that pesky comma right after the first 3 characters which prevents you from using straight s.split() but regular expressions are mighty!!

import re
s = '굿모닝, Today is 촉촉'
re.split(',?\s', s)

Outputs ['굿모닝', 'Today', 'is', '촉촉']

Just split your string by an optional comma ,? followed by a non-optional white character \s

answered Jan 04 '18 at 04:14

Savir

17,568
15
82
136

1

Thank you very much, BorrajaX. – Chan Jan 04 '18 at 04:29
No problem!! **:-)** – Savir Jan 04 '18 at 04:38
What about the more complicated string containing Korean, Chinese and English? S = '굿모닝, Today is 촉촉. 小心保重'. How to obtain the result of ['굿모닝', 'Today', 'is', '촉촉', '小', '心', '保', '重']? – Chan Jan 04 '18 at 04:42
Oh, that's a different ballgame... Not because of the Chinese characters per se, but because there's no clear *divider*. I mean... You want to get `촉촉` together but `小`, `心`, `'保` and `重` separately... It's very difficult to tell that to a regular expression (as a matter of fact, I don't know how to do it) – Savir Jan 04 '18 at 04:48
You might wanna take a look to [this other question](https://stackoverflow.com/q/3797746/289011) (particularly the [second answer](https://stackoverflow.com/a/3797753/289011) that talks about NLP) – Savir Jan 04 '18 at 04:51

Split a string that includes Korean characters

1 Answers1