-3

i want to extrect : tamar tamar,0529589055

from this text and i ahve to do that multiple times.

                    <h3 class="name">tamar tamar</h3>
                    <ul class="list-inline">
                        <li>gender:female</li>
                        <li>age:20</li>
                    <li class="phone" data="0529589055">phone:  0529589055</li>
                    <li class="email" data="tamar0529589055@gmail.com">email: tamar89055@gmail.com</li>         <!--                        <a 
  • This question is clearly improvable. Please post a clearer and more detailed question so you also get a good and detailed answer – yatu Feb 03 '19 at 18:24
  • This looks like HTML and not just plain old text. [`HTML parser`](https://docs.python.org/3/library/html.parser.html#module-html.parser) would be the way to go. There is a very enlightening [post](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags#1732454) on the topic around here. :) – Ondrej K. Feb 03 '19 at 18:24
  • 1
    Possible duplicate of [Extracting data from HTML with Python](https://stackoverflow.com/questions/17126686/extracting-data-from-html-with-python) – Avishay Cohen Feb 03 '19 at 18:26

2 Answers2

0

did you think about trying to use regex? for example a simple (\w+ \w+)</h3> will extract the name. at least for the example above. for the number something like: (0\d+)</li> from the top of my head.

an online regex site that i find easy to use: https://pythex.org

and python regex docs: https://docs.python.org/2/library/re.html

Avishay Cohen
  • 1,978
  • 2
  • 21
  • 34
0

BeautifulSoup is what you are looking for

from bs4 import BeautifulSoup
a='''<h3 class="name">tamar tamar</h3>
<ul class="list-inline">
    <li>gender:female</li>
    <li>age:20</li>
<li class="phone" data="0529589055">phone:  0529589055</li>
<li class="email" data="tamar0529589055@gmail.com">email: tamar89055@gmail.com</li> 
'''
soup = BeautifulSoup(a)
print(soup.find('h3',{"class": "name"}).text)
print(soup.find('li',{"class":'phone'}).text)
mad_
  • 8,121
  • 2
  • 25
  • 40