-1

I have a sentence by combining emoji encoding and I want to separate after "\u" characters

sentance = "Whoaaa\\ud83d\\udc4f"

and other case :

sentance = "blabla whoaaa\\ud83d\\udc4f blabla"

I want the results like this:

result= "blabla whoaaa \\ud83d\\udc4f blabla"

or

sentance = "Whoaaa \\ud83d\\udc4f"

3 Answers3

0

I think it will be hard to do in a regular expression since the \u is not a character but part of a unicode value syntax...

what I would do is test for each char if its an emoji like in the question: How to check the Emoji property of a character in Python?

result = "".join([" " + c if test_emoji(c) else c for c in test_str])
Yoav Glazner
  • 7,936
  • 1
  • 19
  • 36
0

Try this,

import re

pattern = re.compile('^[A-Za-z\s]*')
sentance1 = "Whoaaa\\ud83d\\udc4f"
sentance2 = "blabla whoaaa\\ud83d\\udc4f blabla"

string_before_emoji = pattern.findall(sentance1)[0]
emoji_only = sentance1.split(string_before_emoji)[1].replace('\\', '\\\\')
print(f"{string_before_emoji} {emoji_only}")
# Whoaaa \\ud83d\\udc4f

string_before_emoji = pattern.findall(sentance2)[0]
emoji_only = sentance2.split(string_before_emoji)[1].replace('\\', '\\\\')
print(f"{string_before_emoji} {emoji_only}")
# blabla whoaaa \\ud83d\\udc4f blabla

regex pattern I used,

enter image description here

Kushan Gunasekera
  • 7,268
  • 6
  • 44
  • 58
-2

I'm guessing that maybe this expression might do that:

(?:\s|^)([^\\]+)(?=\\u|\\\\u)

Test with re.sub

import re

regex = r"(?:\s|^)([^\\]+)(?=\\u|\\\\u)"
test_str = "blabla whoaaa\\\\ud83d\\\\udc4f blabla blabla whoaaa\\\\ud83d\\\\udc4f\\\\ud83d\\\\udc4f blabla\\\\ud83d blabla\\\\ud83d blabla\\\\ud83d "
subst = "\\1 "

print(re.sub(regex, subst, test_str))

Output

blabla whoaaa \\ud83d\\udc4f blabla blabla whoaaa \\ud83d\\udc4f\\ud83d\\udc4f blabla \\ud83d blabla \\ud83d blabla \\ud83d

The expression is explained on the top right panel of this demo if you wish to explore/simplify/modify it.

Emma
  • 27,428
  • 11
  • 44
  • 69