Find substring in a string in python by using regular expression

Question

I am trying to find regx for following string

<Method name="aa" numberOfInstances="1" /><Method name="bb" numberOfInstances="4" />

I want Method name and numberOfInstances

Output would be

aa   1

bb   4

I achieved it by substring, but it's slow

fun_indexs= [i for i in range(len(line)) if line.startswith('Method name=',i)] 
count_indexs= [i for i in range(len(line)) if line.startswith('numberOfInstances=',i)] 
slash_index= [i for i in range(len(line) ) if line.startswith('/>', i )]

Any help appreciated.

fun_indexs= [ i for i in range( len( line ) ) if line.startswith( 'Method name=', i ) ] count_indexs= [ i for i in range( len( line) ) if line.startswith( 'numberOfInstances=', i ) ] slash_index= [ i for i in range( len( line) ) if line.startswith( '/>', i ) ] — Sumo, May 08 '23 at 05:05
why regex? this is html. why not use an appropriate parser (e.g. [`beautifulsoup4`](https://beautiful-soup-4.readthedocs.io/))? — hiro protagonist, May 08 '23 at 05:07
@Sumo Please add your code directly to the question using the *Edit* link. — InSync, May 08 '23 at 05:13
@Samwise thanks! wanted to link that question myself but could not find it... — hiro protagonist, May 08 '23 at 05:32

score 1 · Answer 1 · answered May 08 '23 at 05:06

Try this:

import re

s = '<Method name="aa" numberOfInstances="1" /><Method name="bb" numberOfInstances="4" />'

# regex pattern
pattern = r'name="(\w+)"\s+numberOfInstances="(\d+)"'
matches = re.findall(pattern, s)

for match in matches:
    name, instances = match
    print(name, instances)

score 1 · Answer 2 · answered May 08 '23 at 05:22

Suppose this string:

s = '<Method name="aa" numberOfInstances="1" /><Method name="bb" numberOfInstances="4" />'

You can parse your string as XML:

import xml.etree.ElementTree as ET

tree = ET.fromstring(f'<data>{s}</data>')
data = [method.attrib for method in tree.findall('./Method')]

Output:

>>> data
[{'name': 'aa', 'numberOfInstances': '1'},
 {'name': 'bb', 'numberOfInstances': '4'}]

Alternative:

data = []
for method in tree.findall('.//Method'):
    data.append({'name': method.attrib['name'],
                 'instances': int(method.attrib['numberOfInstances'])})
print(data)

# Output
[{'name': 'aa', 'instances': 1}, {'name': 'bb', 'instances': 4}]

Find substring in a string in python by using regular expression

2 Answers2