python help display regular expression result

Question

I am doing simple regular expressions in python

I am trying the re.split but things like ['\r\n', '\r\n'] are coming instead of the answer. Can someone please tell me how to display the actual text please?

I tried this statement:

t_html = re.split("<[a-zA-Z0-9\s\w\W]*>[a-zA-Z0-9\s\w\W]*</[a-zA-Z0-9\s\w\W]*>" ,s)

THanks

I am trying to get all the html tags and their contents...for example if I had this: "helloasfasdf" it would split it up as hello and asfasdf — Lilz, Dec 02 '09 at 23:43
Don't use regex to parse html. use Beautiful Soup www.crummy.com/software/BeautifulSoup — John La Rooy, Dec 02 '09 at 23:44
Consider what happens with real html where the tags are nested.
some stuff
more stuff
still more stuff — John La Rooy, Dec 02 '09 at 23:47
gnibbler is right. Use Beautiful Soup to parse HTML. Do not repeat do not attempt to use regular expressions to parse HTML. — steveha, Dec 03 '09 at 00:49

score 0 · Answer 1 · answered Dec 02 '09 at 23:39

re.split by its very nature splits on the pattern but does not preserve it. If you want to return the string matched by the pattern you can put parentheses around the pattern: re.split((R),string) where R is your expression. If you want to say find all non overlapping matches use re.findall which will return a list. See here for more details and options.

score 0 · Accepted Answer · edited Jun 20 '20 at 09:12

0

If you want to use a regex to parse html, see here.

edited Jun 20 '20 at 09:12

Community

1
1

answered Dec 03 '09 at 03:23

Matt Anderson

19,311
11
41
57

python help display regular expression result

2 Answers2