0

How can i get rid of these u in output?

Regex:

Tregex1 = "1?\W*([2-9][0-8][0-9])\W*([2-9][0-9]{2})\W*([0-9]{4})(\se?x?t?(\d*))?"

Code:

for a in re.findall(Tregex1,text_value,re.IGNORECASE):
        print a

Output:

(u'877', u'638', u'7848', u'\n', u'')
(u'650', u'627', u'1000', u'\n', u'')
(u'650', u'627', u'1001', u'\nE', u'')
(u'312', u'273', u'4100', u'', u'')

I tried using these & followed several similar links

a.encode('ascii', 'ignore')
a.encode('utf-8')
",".join(a)

But none of them are working.

Expected Output:

877-638-7848
650-627-1000
650-627-1001
312-273-4100

I am using Python 2.7

Also can someone explain, why i am getting sometimes \n while \nE otherwise or even blank?

Community
  • 1
  • 1
prashantitis
  • 1,797
  • 3
  • 23
  • 52

3 Answers3

2

try this:

for a in re.findall(Tregex1,text_value,re.IGNORECASE):
    print '-'.join(a[:3])

the u just tells you that it's a unicode string.

the (..., ...,) is the representation of the tuples

what '-'.join(...) does is connect the strings of ... with a -

a[:3] means "only the first three elements of a"

(for a good explanation of the slicing notation in python look here: https://stackoverflow.com/a/509295/327293)

Community
  • 1
  • 1
phogl
  • 494
  • 1
  • 8
  • 16
1

Your problem is not the u. If you want to format your results in a specific way, you should use the string formatting functions.

print '-'.join(a)
Daniel Roseman
  • 588,541
  • 66
  • 880
  • 895
1

The u just means it is unicode. You can recode it as you wish. This will work, and also skip the blank values:

a = (u'877', u'638', u'7848', u'\n', u'')
print "-".join([x.strip() for x in a if x.strip() != u""])

877-638-7848

Dan
  • 1,209
  • 3
  • 13
  • 29