-3

I am trying to find regx for following string

<Method name="aa" numberOfInstances="1" /><Method name="bb" numberOfInstances="4" />

I want Method name and numberOfInstances

Output would be

aa   1

bb   4

I achieved it by substring, but it's slow

fun_indexs= [i for i in range(len(line)) if line.startswith('Method name=',i)] 
count_indexs= [i for i in range(len(line)) if line.startswith('numberOfInstances=',i)] 
slash_index= [i for i in range(len(line) ) if line.startswith('/>', i )] 

Any help appreciated.

Sakib Rahman
  • 333
  • 2
  • 13
Sumo
  • 45
  • 5

2 Answers2

1

Try this:

import re

s = '<Method name="aa" numberOfInstances="1" /><Method name="bb" numberOfInstances="4" />'

# regex pattern
pattern = r'name="(\w+)"\s+numberOfInstances="(\d+)"'
matches = re.findall(pattern, s)

for match in matches:
    name, instances = match
    print(name, instances)
Sakib Rahman
  • 333
  • 2
  • 13
1

Suppose this string:

s = '<Method name="aa" numberOfInstances="1" /><Method name="bb" numberOfInstances="4" />'

You can parse your string as XML:

import xml.etree.ElementTree as ET

tree = ET.fromstring(f'<data>{s}</data>')
data = [method.attrib for method in tree.findall('./Method')]

Output:

>>> data
[{'name': 'aa', 'numberOfInstances': '1'},
 {'name': 'bb', 'numberOfInstances': '4'}]

Alternative:

data = []
for method in tree.findall('.//Method'):
    data.append({'name': method.attrib['name'],
                 'instances': int(method.attrib['numberOfInstances'])})
print(data)

# Output
[{'name': 'aa', 'instances': 1}, {'name': 'bb', 'instances': 4}]
Corralien
  • 109,409
  • 8
  • 28
  • 52