0

I am very new to regex and python and I am struggling with the following.I have one specific string from a file:

<td align="left">                                   
<(this, '/hdm/SingleDeviceMgmt/getDevice.do?deviceID)>                                                                 
<ahref="editDevice.do?deviceID=100089">
<do?deviceID/iopp>
GSE5677789
</a>
</td>
<input type="text" name="serialNumber" id="serialNumber" 
class="input_field" value="GSE5677789"  title="Enter Number. ">
<ahref="editDevice.do?deviceID=100089">

I need to fetch the deviceID which is equal to 100089 from the string.

The python code that I wrote is:

import re
with open('json_conversion.txt') as f:
for line in f:
    if "GSE5677789" and 'deviceID' in line:
        s=re.search(r'^deviceID=//.*\.',line)
        print s

But I am getting is None.

Can anyone please help.

Addy
  • 23
  • 1
  • 2
  • 10
  • 2
    Why are you using `^deviceID`? `^` means "beginning of line", but `deviceID` occurs in the middle of the line. – John Gordon Feb 21 '18 at 18:35
  • 1
    [H̸̡̪̯ͨ͊̽̅̾̎Ȩ̬̩̾͛ͪ̈́̀́͘ ̶̧̨̱̹̭̯ͧ̾ͬC̷̙̲̝͖ͭ̏ͥͮ͟Oͮ͏̮̪̝͍M̲̖͊̒ͪͩͬ̚̚͜Ȇ̴̟̟͙̞ͩ͌͝S̨̥̫͎̭ͯ̿̔̀ͅ](https://stackoverflow.com/a/1732454/7505395) do not parse html with regex :) – Patrick Artner Feb 21 '18 at 18:36

3 Answers3

1
import re
your_string = """
<td align="left">                                   

<ahref="editDevice.do?deviceID=100089">
GSE5677789
</a>
</td>
<input type="text" name="serialNumber" id="serialNumber" 
class="input_field" value="GSE5677789"  title="Enter Number. ">
<ahref="editDevice.do?deviceID=100089">
"""
m = re.search('deviceID=([0-9]*)', your_string).group(1)
Nathan
  • 3,558
  • 1
  • 18
  • 38
  • Thanks for the comment but I am getting an error saying NoneType doesn't have any group() attribute. @Nathan – Addy Feb 21 '18 at 18:59
  • @Addy That's strange. For me this works in both python 2.* and 3.* Have you tried literally copy pasting this? – Nathan Feb 21 '18 at 19:10
  • Yes thats what I did. :( @Nathan – Addy Feb 22 '18 at 17:46
  • Hi @Nathan, I have modified the string a bit, can you please have a kind look. The actual response is huge which contains a lot of tags. Thanks. – Addy Feb 22 '18 at 19:05
  • I think it should still work. It will probably get a lot slower though so perhaps it would be better to follow Ajax1234 solution as that's specifically tailored for html code. @Addy – Nathan Feb 23 '18 at 06:21
1

Firstly, I can't work out what the if "G1A115051301136" is tring to acheive.

Secondly, your regex is incorrect. Try 'deviceID=(\d+)' instead.

The ^ in regex is a marker for the beginning of a line, and this will only match to a phrase a the start of a line. The brackets in my answer signify a capture group, and allow easy extraction of the number from the returned results.

I personally test all regex I write first using an online tool such as this.

Jon
  • 401
  • 3
  • 11
0

For scraping HTML, it is best to use a parsing library such as BeautifulSoup to be as precise as possible:

from bs4 import BeautifulSoup as soup
import re
s = """
<td align="left">                                   
<ahref="editDevice.do?deviceID=100089">
GSE5677789
</a>
</td>
<input type="text" name="serialNumber" id="serialNumber" 
class="input_field" value="GSE5677789"  title="Enter Number. ">
<a href="editDevice.do?deviceID=100089">
"""
data = soup(s, 'lxml')
final_data = re.findall('(?<=\=)\d+', data.find('a')['href'])[0]

Output:

'100089'
Ajax1234
  • 69,937
  • 8
  • 61
  • 102