-1

The following is my information:

The input:

 \"button\" \"button\" href=\"#\"   data-id=\"11111111111\"  \"button\" \"button\" href=\"#\"   data-id=\"222222222222\"
     \"button\" \"button\" href=\"#\"  

The output I'd like:

11111111111
222222222222

My 1st code which worked well:

text = 'data-id=\"11111111111 \" data-id=\"222222222222\" '
c = re.findall('data-id=\\"(.*?)\\"', text)

My 2nd code which doesn't work. It show nothing

with open("E:/test.txt","r") as f:
    text = f.readline()

c = re.findall('data-id=\\"(.*?)\\"', text)

Why my secondary code doesn't work. Please help me fix it. I highly appreciate you. Thank you :)

NGuyen
  • 265
  • 5
  • 13

2 Answers2

1

You can do:

re.findall(r'"([^\\]+)\\"', s)
  • "([^\\]+) matches a ", then the captured grouo contains the desired portion i.e. substring upto next \, \\" makes sure that the portion is followed by \\"

Example:

In [34]: s
Out[34]: 'randomtext data-id=\\"11111111111\\" randomtext data-id=\\"222222222222\\"'

In [35]: re.findall(r'"([^\\]+)\\"', s)
Out[35]: ['11111111111', '222222222222']

Answer to edited question:

Use \d+ to match digits:

re.findall(r'"(\d+)\\"', s)

to match based on ID instead:

re.findall(r'data-id=\\"([^\\]+)\\"', s)

Example:

In [45]: s
Out[45]: '\\"button\\" \\"button\\" href=\\"#\\"   data-id=\\"11111111111\\"  \\"button\\" \\"button\\" href=\\"#\\"   data-id=\\"222222222222\\" \\"button\\" \\"button\\" href=\\"#\\"'

In [50]: re.findall(r'"(\d+)\\"', s)
Out[50]: ['11111111111', '222222222222']

In [46]: re.findall(r'data-id=\\"([^\\]+)\\"', s)
Out[46]: ['11111111111', '222222222222']
heemayl
  • 39,294
  • 7
  • 70
  • 76
  • Hello, Your code worked great but my input is more complicated. I just updated the question. Sorry for that – NGuyen Aug 21 '16 at 04:25
1

Please check this answer. (Added two lines in str_txt.txt file).

Only change I did in your second code is , I have 'r' as prefix in regex. For more info on 'r' prefix in regex, please check here !!!

import re
with open("str_txt.txt","r") as f:
    text = f.readlines()
for line in text:
    c=[]
    c = re.findall(r'data-id=\\"(.*?)\\"', line)
    print c

Output:

C:\Users\dinesh_pundkar\Desktop>python demo.Py
['11111111111', '222222222222']
['1111113434111', '222222222222']
Community
  • 1
  • 1
Dinesh Pundkar
  • 4,160
  • 1
  • 23
  • 37