0

I'm defining a function that counts the number of times XML tags are present in a string, namely <> and </>. Although I'm able to 'match' the strings, I'm unable to quantify those number of matches to return a numeric count. This is my function:

def tag_count(the_string):
    count = 0
    for element in the_string:
        if '<>' in the_string:
            count += 1
            return count
        elif '</>' in the_string:
             count += 1
             return count

The problem is that for strings that have both <> and </>, count should return the number of times these tags are matched whereas my function is only returning count as 1 because of the elif condition. I tried inserting and in the 3rd line but that gives me an error. How do I sum the number of times these tags are matched?

shiv_90
  • 1,025
  • 3
  • 12
  • 35
  • 1
    move `return count` out of for loop – Sohaib Farooqi Dec 08 '17 at 15:57
  • Your `return` statement is within your loop. So it returns `1` as soon as it has found the first occurrence. Move `return` to the end, out of the loop to avoid this. – Mr. T Dec 08 '17 at 15:59
  • 3
    Note that real XML tags looks like `...` or ``, and not `<>` or `>`. You need a different way of matching to catch them. Also, you count open-close tags as two tags. A reasonable implementation would likely also check that open and close tags match each other; this would require a stack. For a practical purpose I'd take a real XML parser (included in Python) and count non-text nodes in the tree recursively. – 9000 Dec 08 '17 at 16:05
  • Many thanks for all the responses! – shiv_90 Dec 08 '17 at 17:48

1 Answers1

4

You return from the function every time you encounter a tag, that's why it's always 1. You can also use the str.count() method:

def tag_count(source):
    return source.count('<>') + source.count('</>')

Example usage:

>>> tag_count('<> <> </> <> <>')
5
Norrius
  • 7,558
  • 5
  • 40
  • 49
  • This is elegance. – shiv_90 Dec 08 '17 at 17:49
  • What if the string was something like `['', 'Foo!', ']`. Then how would the universal matching work? – shiv_90 Dec 09 '17 at 04:54
  • 1
    If you have a real XML document with syntax like ``, your only sane option would be using an actual XML parser (you can find some in modules `xml` or `ltree`). If your data is a subset of XML (where does the splitting come from, by the way?), there might be a simpler method, but there's no way to tell without knowing precisely what you're dealing with. – Norrius Dec 09 '17 at 11:11
  • For example I'm having a simple list of strings `list = ['', 'World!', '', '', '']` and would like to sum up the number of starting and ending tags here. – shiv_90 Dec 09 '17 at 13:11
  • In a simple case like this you might use list comprehension to filter out those that don't start with an opening bracket or don't end with a closing one: `len([item for item in items if item.startswith('<') and item.endswith('>')])` – Norrius Dec 09 '17 at 15:29