match all occurrence of regular

Question

I want to extract all occurrences of a pattern in Python. Here is what i have done

import re

string="Any information <p>sent to the server as clear text</p>, may be stolen and used later for <p>identity theft</p> or user impersonation. In addition, several privacy regulations state that sensitive information such as user<p> credentials will always be sent encrypted </p> to the web site."

regex='<p>.*</p>' # obviously it matches starting <p> to the last </p>

if re.findall(regex, String):
    print(re.findall(regex, string))
else:
    print('no match found')

I want to extract all the occurance of paragraph tags. I mean the output should be a list which looks like this

['<p>sent to the server as clear text</p>', '<p>identity theft</p>', '<p> credentials will always be sent encrypted </p>']

I've found few similar questions but not serving the purpose Find all occurrences of a substring in Python

Finding multiple occurrences of a string within a string in Python

The first failure source is often the regex, you can check it here https://regex101.com/ — Anderas, Mar 30 '18 at 06:19
Do not use `re.findall` twice. Use `res = re.findall(...)` and then display the message you want after checking `res` length. — Wiktor Stribiżew, Mar 30 '18 at 06:20
Got the answer here https://stackoverflow.com/questions/22444/my-regex-is-matching-too-much-how-do-i-make-it-stop Making .* non-greedy did the trick.. Thanks @WiktorStribiżew — Navneet, Mar 30 '18 at 06:27

score 0 · Accepted Answer · answered Mar 30 '18 at 06:23

0

change your regex like this :

regex=r"<p>.*?</p>"

It gives o/p like :

['<p>sent to the server as clear text</p>', '<p>identity theft</p>', 
 '<p> credentials will always be sent encrypted </p>']

answered Mar 30 '18 at 06:23

Vikas Periyadath

3,088
1
21
33

this is the exact regex. regex='
.*?<\/p>'
– Navneet Mar 30 '18 at 06:39

match all occurrence of regular

1 Answers1