Implementing Regular expressions in Python

Question

I have a code like this.

 <td class="check ABCD" rowspan="2"><center><div class="checkbox {{#if checked}}select{{else}}deselect{{/if}}" id="{{id}}" {{data "tool"}

<td class="check" rowspan="2"><center><div class="checkbox {{#if checked}}select{{else}}deselect{{/if}}" id="{{id}}" {{data "tool"}}>

And I want to extract only the class and ID name in the above code. I have very little knowledge about using regular expression in python.

How can I extract only the class name & id name(the ones in between "") using regular expression? or is there any better way to do this?. If yes, please help me finding it :)

Thanks in advance.

@idjaw Is there any other way to extract it? Other than Regex? — Karthik Hegde, Mar 21 '16 at 06:07
I don't know if this is part of a much bigger chunk of data. But this definitely looks like HTML, and if you are trying to parse through that, you should use something like [BeautifulSoup](https://pypi.python.org/pypi/beautifulsoup4) — idjaw, Mar 21 '16 at 06:09
http://stackoverflow.com/questions/2612548/extracting-an-attribute-value-with-beautifulsoup this should help you — ashishmohite, Mar 21 '16 at 06:10
You can convert the dom element into BeautifulStoneSoup object and then get the attribute values — ashishmohite, Mar 21 '16 at 06:11
@idjaw Basically this is the difference of two similar .stache files. I want to eliminate the unwanted data from this diff. Anyhow, I will look into BeautifulSoap. Thanks :) — Karthik Hegde, Mar 21 '16 at 06:12
Obligatory: [You cannot parse XHTML with regular expressions](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) — Blckknght, Mar 21 '16 at 06:13
@NEO-xx Thanks for the help. I will definitely look into it. I didn't know about BeautifulSoap ! — Karthik Hegde, Mar 21 '16 at 06:14
@KarthikHegde: It is **Soup** not **Soap** (some programmers might mix both, though...) — Jan, Mar 21 '16 at 06:50
@idjaw How do I parse using BeautifulSoup if those part of the code is string? I mean if it is stored in .txt file? — Karthik Hegde, Mar 21 '16 at 09:49

score 2 · Answer 1 · edited May 23 '17 at 10:28

2

Since you asked for a Regex solution in Python, you'll get one:

import re
p = re.compile(ur'^.+?class="([^"]+)".+id="([^"]+)".+?$', re.MULTILINE)
test_str = u"<td class=\"check ABCD\" rowspan=\"2\"><center><div class=\"checkbox {{#if checked}}select{{else}}deselect{{/if}}\" id=\"{{id}}\" {{data \"tool\"}\n<td class=\"check\" rowspan=\"2\"><center><div class=\"checkbox {{#if checked}}select{{else}}deselect{{/if}}\" id=\"{{id}}\" {{data \"tool\"}}>"

re.findall(p, test_str)

See live example over here: https://regex101.com/r/cG8dC5/1

Nevertheless, as some other users already noted. Regex isn't ideal for parsing (x)HTML. Better have a look at: https://pypi.python.org/pypi/beautifulsoup4

edited May 23 '17 at 10:28

Community

1
1

answered Mar 21 '16 at 06:22

netblognet

1,951
2
20
46

When you advise someone not to parse HTML with regex and post a solution nevertheless, isn't this somewhat lurking for rep ;) ? – Jan Mar 21 '16 at 06:25
2

Nope. It's the answer to the question. With an helpful advice. If you ask me: "how to make fire with a lens?" I'll answer you - hold it between the sun and some straw. Nevertheless, it's not the best way. Better use an lighter. So I think its a valid answer to his question and as long I told him, that there are better ways, this isn't lurking. – netblognet Mar 21 '16 at 06:31
I like the fire analogy (+1 for that), however I'd have bought him a lighter :) – Jan Mar 21 '16 at 06:34

Implementing Regular expressions in Python

1 Answers1