2

Is there a python class equivalent to ruby's StringScanner class? I Could hack something together, but i don't want to reinvent the wheel if this already exists.

Ian P
  • 1,512
  • 3
  • 15
  • 18

7 Answers7

10

Interestingly there's an undocumented Scanner class in the re module:

import re

def s_ident(scanner, token): return token
def s_operator(scanner, token): return "op%s" % token
def s_float(scanner, token): return float(token)
def s_int(scanner, token): return int(token)

scanner = re.Scanner([
    (r"[a-zA-Z_]\w*", s_ident),
    (r"\d+\.\d*", s_float),
    (r"\d+", s_int),
    (r"=|\+|-|\*|/", s_operator),
    (r"\s+", None),
    ])

print scanner.scan("sum = 3*foo + 312.50 + bar")

Following the discussion it looks like it was left in as experimental code/a starting point for others.

SimonJ
  • 21,076
  • 1
  • 35
  • 50
4

There is nothing exactly like Ruby's StringScanner in Python. It is of course easy to put something together:

import re

class Scanner(object):
    def __init__(self, s):
        self.s = s
        self.offset = 0
    def eos(self):
        return self.offset == len(self.s)
    def scan(self, pattern, flags=0):
        if isinstance(pattern, basestring):
            pattern = re.compile(pattern, flags)
        match = pattern.match(self.s, self.offset)
        if match is not None:
            self.offset = match.end()
            return match.group(0)
        return None

along with an example of using it interactively

>>> s = Scanner("Hello there!")
>>> s.scan(r"\w+") 
'Hello'
>>> s.scan(r"\s+") 
' '
>>> s.scan(r"\w+")
'there'
>>> s.eos()
False
>>> s.scan(r".*")
'!'
>>> s.eos()
True
>>> 

However, for the work I do I tend to just write those regular expressions in one go and use groups to extract the needed fields. Or for something more complicated I would write a one-off tokenizer or look to PyParsing or PLY to tokenize for me. I don't see myself using something like StringScanner.

Andrew Dalke
  • 14,889
  • 4
  • 39
  • 54
1

Looks like a variant on re.split( pattern, string ).

http://docs.python.org/library/re.html

http://docs.python.org/library/re.html#re.split

S.Lott
  • 384,516
  • 81
  • 508
  • 779
1

https://pypi.python.org/pypi/scanner/

Seems a more maintained and feature complete solution. But it uses oniguruma directly.

lu_zero
  • 978
  • 10
  • 14
0

Maybe look into the built in module tokenize. It looks like you can pass a string into it using the StringIO module.

sheats
  • 33,062
  • 15
  • 45
  • 44
0

Today there is a project by Mark Watkinson that implements StringScanner in Python:

http://asgaard.co.uk/p/Python-StringScanner

https://github.com/markwatkinson/python-string-scanner

http://code.google.com/p/python-string-scanner/

Evgeny
  • 6,533
  • 5
  • 58
  • 64
-1

Are you looking for regular expressions in Python? Check this link from official docs:

http://docs.python.org/library/re.html

Vitaly
  • 2,567
  • 5
  • 29
  • 34