3

I am converting some code to micropython and I got stuck on a particular regular expression.

In python my code is

import re

line = "0-1:24.2.1(180108205500W)(00001.290*m3)"
between_brackets = '\(.*?\)' 

brackettext  = re.findall(between_brackets, line) 
gas_date_str = read_date_time(brackettext[0])
gas_val      = read_gas(brackettext[1])

# gas_date_str and gas_val take the string between brackets 
# and return a value that can later be used

micropython only implements a limited set of re functions

how do I achieve the same with only the limited functions available?

Marc Wagner
  • 1,672
  • 2
  • 12
  • 15

2 Answers2

4

You could do something along the following lines. Repeatedly use re.search while consuming the string. The implementation here uses a generator function:

import re

def findall(pattern, string):
    while True:
        match = re.search(pattern, string)
        if not match:
            break
        yield match.group(0)
        string = string[match.end():]

>>> list(findall(r'\(.*?\)', "0-1:24.2.1(180108205500W)(00001.290*m3)"))
['(180108205500W)', '(00001.290*m3)']
Marc Wagner
  • 1,672
  • 2
  • 12
  • 15
user2390182
  • 72,016
  • 6
  • 67
  • 89
  • I like the compactness of the code and the use of yield. – Marc Wagner Oct 02 '18 at 08:56
  • It is worth mentioning that while match.end is in the core MicroPython library, its implementation varies from port to port. So for example on PyCom boards... there is no .end(). (as of today anyway) – Patrick Mar 27 '20 at 19:01
  • I was missing this one out while tinkering with [micropython regex](https://docs.micropython.org/en/latest/library/re.html) - so this helps a lot! +1 – con Jul 18 '22 at 15:06
2

You can write a method using re.search() that returns a list of all matches:

import re  

def find_all(regex, text):
    match_list = []
    while True:
        match  = re.search(regex, text)
        if match:
            match_list.append(match.group(0))
            text = text[match.end():]
        else:
            return match_list

Also, note that your between_brackets regex will not take care of nested brackets:

re.findall('\(.*?\)', "(ac(ssc)xxz)")
>>> ['(ac(ssc)']
  • thank you. perhaps even more elegant would be to make the match statement the condition for the while loop and place the return statement outside of the loop – Marc Wagner Oct 02 '18 at 08:00
  • ah, never mind. you cannot do an assignment as part of a while statement – Marc Wagner Oct 02 '18 at 08:53
  • "Also, note that your between_brackets regex will not take care of nested brackets:" - Thank you for pointing that out. I know that these situations will not arrise with the data I am trying to parse. How should I handle these cases if it does matter? – Marc Wagner Oct 02 '18 at 08:57
  • 1
    Regexes are the wrong tool for handling recursive text. You can implement the solution described in the [accepted answer here](https://stackoverflow.com/questions/524548/regular-expression-to-detect-semi-colon-terminated-c-for-while-loops/524624#524624) if you need to do that. – manic.coder Oct 03 '18 at 05:38
  • thank you for the link. Is regex right for my situation, or would you recommend a subroutine here too? – Marc Wagner Oct 03 '18 at 06:29