2

I have a string containing Korean characters:

s = '굿모닝, today is 촉촉'

I want to split it as:

t = ['굿모닝', 'today', 'is', '촉촉']

Note that all the Korean characters are put together instead of separated, that is, it is '굿모닝', not '굿', '모', '닝'.

Questions:

  • How do I split that string to get the required output?
  • Do I need to use a regular expression?
jwpfox
  • 5,124
  • 11
  • 45
  • 42
Chan
  • 3,605
  • 9
  • 29
  • 60
  • `s.split(" ")`? – Sohaib Farooqi Jan 04 '18 at 04:13
  • What you want can be achieved by `s.split()`. Can you describe a more complex example or how you want to split by regex? – umutto Jan 04 '18 at 04:13
  • Sorry that I am not familiar with regular expression. I searched the web that I may use re.findall and somethings like [\u3131-\ucb4c], but I don't know to do that exactly. – Chan Jan 04 '18 at 04:25

1 Answers1

4

I don't think Korean has any relevance here... The only issue I can think of is that pesky comma right after the first 3 characters which prevents you from using straight s.split() but regular expressions are mighty!!

import re
s = '굿모닝, Today is 촉촉'
re.split(',?\s', s)

Outputs ['굿모닝', 'Today', 'is', '촉촉']

Just split your string by an optional comma ,? followed by a non-optional white character \s

Savir
  • 17,568
  • 15
  • 82
  • 136
  • 1
    Thank you very much, BorrajaX. – Chan Jan 04 '18 at 04:29
  • No problem!! **:-)** – Savir Jan 04 '18 at 04:38
  • What about the more complicated string containing Korean, Chinese and English? S = '굿모닝, Today is 촉촉. 小心保重'. How to obtain the result of ['굿모닝', 'Today', 'is', '촉촉', '小', '心', '保', '重']? – Chan Jan 04 '18 at 04:42
  • Oh, that's a different ballgame... Not because of the Chinese characters per se, but because there's no clear *divider*. I mean... You want to get `촉촉` together but `小`, `心`, `'保` and `重` separately... It's very difficult to tell that to a regular expression (as a matter of fact, I don't know how to do it) – Savir Jan 04 '18 at 04:48
  • You might wanna take a look to [this other question](https://stackoverflow.com/q/3797746/289011) (particularly the [second answer](https://stackoverflow.com/a/3797753/289011) that talks about NLP) – Savir Jan 04 '18 at 04:51