Add spaces in special characters for emojis using regex

Question

I have a sentence by combining emoji encoding and I want to separate after "\u" characters

sentance = "Whoaaa\\ud83d\\udc4f"

and other case :

sentance = "blabla whoaaa\\ud83d\\udc4f blabla"

I want the results like this:

result= "blabla whoaaa \\ud83d\\udc4f blabla"

or

sentance = "Whoaaa \\ud83d\\udc4f"

Something simple find `r"\\u(?<!\s\\u)"` replace `r" \\u"` – Jul 13 '19 at 20:15 — , Jul 13 '19 at 20:15

score 0 · Answer 1 · answered Jul 13 '19 at 20:21

I think it will be hard to do in a regular expression since the \u is not a character but part of a unicode value syntax...

what I would do is test for each char if its an emoji like in the question: How to check the Emoji property of a character in Python?

result = "".join([" " + c if test_emoji(c) else c for c in test_str])

Kushan Gunasekera · Answer 2 · 2019-07-13T21:29:26.333

0

Try this,

import re

pattern = re.compile('^[A-Za-z\s]*')
sentance1 = "Whoaaa\\ud83d\\udc4f"
sentance2 = "blabla whoaaa\\ud83d\\udc4f blabla"

string_before_emoji = pattern.findall(sentance1)[0]
emoji_only = sentance1.split(string_before_emoji)[1].replace('\\', '\\\\')
print(f"{string_before_emoji} {emoji_only}")
# Whoaaa \\ud83d\\udc4f

string_before_emoji = pattern.findall(sentance2)[0]
emoji_only = sentance2.split(string_before_emoji)[1].replace('\\', '\\\\')
print(f"{string_before_emoji} {emoji_only}")
# blabla whoaaa \\ud83d\\udc4f blabla

regex pattern I used,

edited Jul 13 '19 at 21:29

answered Jul 13 '19 at 20:37

Kushan Gunasekera

7,268
6
44
58

1

pattern.findall(sentance)[0] in other case not to use – Alhamsya Bintang Dyasta Jul 13 '19 at 21:03
I just select whole string before emojis and after that put an extra space and combined all the other emojis into it @AlhamsyaBintangDyasta. – Kushan Gunasekera Jul 13 '19 at 21:05
1

I just changed my question – Alhamsya Bintang Dyasta Jul 13 '19 at 21:25
I just change my regex, please check it again. It's now working for your latest changes and visually display the new regex @AlhamsyaBintangDyasta. – Kushan Gunasekera Jul 13 '19 at 21:32

Emma · Accepted Answer · 2019-07-13T21:44:14.077

-2

I'm guessing that maybe this expression might do that:

(?:\s|^)([^\\]+)(?=\\u|\\\\u)

Test with `re.sub`

import re

regex = r"(?:\s|^)([^\\]+)(?=\\u|\\\\u)"
test_str = "blabla whoaaa\\\\ud83d\\\\udc4f blabla blabla whoaaa\\\\ud83d\\\\udc4f\\\\ud83d\\\\udc4f blabla\\\\ud83d blabla\\\\ud83d blabla\\\\ud83d "
subst = "\\1 "

print(re.sub(regex, subst, test_str))

Output

blabla whoaaa \\ud83d\\udc4f blabla blabla whoaaa \\ud83d\\udc4f\\ud83d\\udc4f blabla \\ud83d blabla \\ud83d blabla \\ud83d

The expression is explained on the top right panel of this demo if you wish to explore/simplify/modify it.

edited Jul 13 '19 at 21:44

answered Jul 13 '19 at 20:08

Emma

27,428
11
44
69

1

OK, I have got an answer from you, thank you very much – Alhamsya Bintang Dyasta Jul 13 '19 at 21:38

Add spaces in special characters for emojis using regex

3 Answers3

Test with re.sub

Output

Test with `re.sub`