-3

I want to extract all occurrences of a pattern in Python. Here is what i have done

import re

string="Any information <p>sent to the server as clear text</p>, may be stolen and used later for <p>identity theft</p> or user impersonation. In addition, several privacy regulations state that sensitive information such as user<p> credentials will always be sent encrypted </p> to the web site."

regex='<p>.*</p>' # obviously it matches starting <p> to the last </p>

if re.findall(regex, String):
    print(re.findall(regex, string))
else:
    print('no match found')

I want to extract all the occurance of paragraph tags. I mean the output should be a list which looks like this

['<p>sent to the server as clear text</p>', '<p>identity theft</p>', '<p> credentials will always be sent encrypted </p>']

I've found few similar questions but not serving the purpose Find all occurrences of a substring in Python

Finding multiple occurrences of a string within a string in Python

Cœur
  • 37,241
  • 25
  • 195
  • 267
Navneet
  • 253
  • 3
  • 12
  • The first failure source is often the regex, you can check it here https://regex101.com/ – Anderas Mar 30 '18 at 06:19
  • Do not use `re.findall` twice. Use `res = re.findall(...)` and then display the message you want after checking `res` length. – Wiktor Stribiżew Mar 30 '18 at 06:20
  • Got the answer here https://stackoverflow.com/questions/22444/my-regex-is-matching-too-much-how-do-i-make-it-stop Making .* non-greedy did the trick.. Thanks @WiktorStribiżew – Navneet Mar 30 '18 at 06:27

1 Answers1

0

change your regex like this :

regex=r"<p>.*?</p>"

It gives o/p like :

['<p>sent to the server as clear text</p>', '<p>identity theft</p>', 
 '<p> credentials will always be sent encrypted </p>']
Vikas Periyadath
  • 3,088
  • 1
  • 21
  • 33