Loop and display substring in python

Question

I wanted to display more than one substring from a string.

Raw string: <td></td><td></td><td></td><td></td><td>Mar08</td><td>Mar09</td><td>Mar10</td><td>Mar11</td><td>Mar12</td><td>Mar13</td></tr>

To display, expected result[Substring] :

Mar08 Mar09 Mar10 Mar11 Mar12 Mar13

I've tried with this code

def parseyear(list):
    sfind = "<strong>"
    efind = "</strong>"
    i = 0
    while i < len(list):
        s =  list.find(sfind,i,len(list))
        e = list.find(efind,s,len(list))
        v = list[s+len(sfind):e]
        i =  i + s
        print v

But it doesn't give the expected result.

This looks like HTML. Consider using an HTML parser? – Chris Martin Sep 07 '15 at 06:07 — Chris Martin, Sep 07 '15 at 06:07
I don't see any difference between input and output – Ahsanul Haque Sep 07 '15 at 06:07 — Ahsanul Haque, Sep 07 '15 at 06:07
@AhsanulHaque please find the edited version . – jOSe Sep 07 '15 at 06:08 — jOSe, Sep 07 '15 at 06:08
Oops, was just trying to adjust formatting a little. Sorry! – Chris Martin Sep 07 '15 at 06:09 — Chris Martin, Sep 07 '15 at 06:09
@ChrisMartin Thank you, no problem – jOSe Sep 07 '15 at 06:10 — jOSe, Sep 07 '15 at 06:10

Juan Diego Godoy Robles · Accepted Answer · 2015-09-07T06:54:24.640

2

Use a regex:

>>> for m in re.findall(r'<strong>([^<]+)</strong>', raw_string):
...     print m
... 
Mar08
Mar09
Mar10
Mar11
Mar12
Mar13

edited Sep 07 '15 at 06:54

answered Sep 07 '15 at 06:11

Juan Diego Godoy Robles

14,447
2
38
52

(see also: https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) – Chris Martin Sep 07 '15 at 06:13
When I tried to do the same for following raw text, it doesn't work 0.000.000.000.210.231.231.301.740.870.98 – jOSe Sep 07 '15 at 06:23
1

Now you have two problems. – Stefan van den Akker Sep 07 '15 at 06:33
Just refine the ``regex`` @jOSe. See my ed answer – Juan Diego Godoy Robles Sep 07 '15 at 06:54

score 1 · Answer 2 · answered Sep 07 '15 at 06:57

If you do not want to use regex:

def find_substrings(s, delim_start, delim_end):
    """Find the string that is delimited by two different strings."""
    start = s.find(delim_start)
    # to calculate the length of the start delimiter
    len_delim_start = len(delim_start)
    while start != -1:
        end = s.find(delim_end, start + 1)
        substring = s[(start + len_delim_start):end]
        # print only if substring is not empty
        if substring: print substring
        start = s.find(delim_start, end + 1)

html = """
<td><strong></strong></td><td><strong></strong></td><td><strong></strong></td><td><strong></strong>
</td><td><strong>Mar08</strong></td><td><strong>Mar09</strong></td><td><strong>Mar10</strong></td>
<td><strong>Mar11</strong></td><td><strong>Mar12</strong></td><td><strong>Mar13</strong></td></tr>
"""

html2 = """
<td><strong>0.00</strong></td><td><strong>0.00</strong></td><td><strong>0.00</strong></td><td>
<strong>0.21</strong></td><td><strong>0.23</strong></td><td><strong>1.23</strong></td><td><strong>
1.30</strong></td><td><strong>1.74</strong></td><td><strong>0.87</strong></td><td><strong>
0.98</strong></td></tr>
"""

find_substrings(html2, "<strong>", "</strong>")

# output:
# 0.00
# 0.00
# 0.00
# 0.21
# 0.23
# 1.23
# 1.30
# 1.74
# 0.87
# 0.98

score 0 · Answer 3 · answered Sep 07 '15 at 07:07

Simply using xml parser, given known xml data structure.

import xml.etree.ElementTree 
s = "<tr><td><strong></strong></td><td><strong></strong></td><td><strong></strong></td><td><strong></strong></td><td><strong>Mar08</strong></td><td><strong>Mar09</strong></td><td><strong>Mar10</strong></td><td><strong>Mar11</strong></td><td><strong>Mar12</strong></td><td><strong>Mar13</strong></td></tr>"
parsed_xml = xml.etree.ElementTree.fromstring(s)
values = [e.text for e in parsed_xml.findall("./td/strong") if e.text]
assert values == ['Mar08', 'Mar09', 'Mar10', 'Mar11', 'Mar12', 'Mar13']

Loop and display substring in python

3 Answers3