-4

I want to remove words in string if it contains \u in python?

ex:

string ="\uf064thickness cfoutside\uf0d7\uf03a\uf03d TC2019 45TRCMat"

The final output should be like this.

"TC2019 45TRCMat"

After removing all of the words if it contains \u.

Dishin H Goyani
  • 7,195
  • 3
  • 26
  • 37
johnson
  • 379
  • 2
  • 17
  • Aside from the above? What data type is that final output? Is it always the last 2 words? What if those final strings contain those literals instead of unicode? [ask] – Sayse Dec 16 '19 at 07:40
  • I'm really new to python. I tried to use regex. But could not able to get above output – johnson Dec 16 '19 at 07:42
  • There is no `"\u"` in the string. There is `"\uf064"` for example, but that is the representation for only one unicode character. You can check it with `len("\uf064")`. – Matthias Dec 16 '19 at 07:44
  • you can read more about split() and replace() , those should be in every starting tutorial or documentation of python – Linh Nguyen Dec 16 '19 at 07:44
  • No, it not always last 2 words. its a string that contains huge set of words(that contains words with \u sign and without it). Just want to extract words if it doesn't contain \u sign – johnson Dec 16 '19 at 07:45
  • Remove \u sign from the string is also fine.I tried to use string.replace('\u'," ").But it gives me an error SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: truncated \uXXXX escape – johnson Dec 16 '19 at 07:47
  • us can use `string.split()[-2:]` (output is a list) / `' '.join(string.split()[-2:])`(output is a string) if you want is always the last to elements – Shijith Dec 16 '19 at 07:49

1 Answers1

1

Rather then looking to remove unicode character go the other way and only allow ascii character:

string ="\uf064thickness cfoutside\uf0d7\uf03a\uf03d TC2019 45TRCMat"

def is_ascii(s):
    return all(ord(c) < 128 for c in s)

for s in string.split(" "):
    if is_ascii(s):
        print(s)

Reference: How to check if a string in Python is in ASCII?

Boendal
  • 2,496
  • 1
  • 23
  • 36