18

This is written in Python,

import re
s='1 89059809102/30589533 IronMan 30 Santa Ana Massage table / IronMan 30 Santa Ana Massage table'
pattern='\s(\d{11})/(\d{8})'
re.match(pattern,s)

it returns none.

I tried taking the brackets off,

pattern='\s\d{11}/\d{8}' 

It still returns none.

My questions are:

  1. Why the re.match does not find anything?
  2. What is the difference with or without bracket in pattern?
Spontifixus
  • 6,570
  • 9
  • 45
  • 63
bing
  • 450
  • 3
  • 6
  • 15
  • @nhahtdh: `\s` and `\d` have no meaning in normal python strings, so in this specific case there is no difference and the backslashes do not require escaping. – Martijn Pieters Feb 18 '13 at 10:11
  • @MartijnPieters: You are right, but raw string is still quite useful to clear the confusion. Different language has different treatment for ``\`` followed by a character that does not form an escape sequence. – nhahtdh Feb 18 '13 at 10:14
  • 2
    @nhahtdh: I quite agree; using `r''` raw strings for regular expressions is certainly a great idea and is the best practice. Just in this case the OP is lucky and there is no difference. :-) – Martijn Pieters Feb 18 '13 at 10:16
  • @nhahtdh: Almost all languages that use ``\`` one-character escape sequences follow the ISO C standard though (see the [third column on the control codes table](http://en.wikipedia.org/wiki/ASCII#ASCII_control_characters)), so *generally* speaking you can assume there are at most 9 such escape codes in any language that supports these. In fact, I know of no programming language that supports such escape codes and support more than those 9 (python itself supports 8 of them, `\e` not seen as often). – Martijn Pieters Feb 18 '13 at 10:23
  • @nhahtdh: LBNL: Of all the regular expression anchors and character classes that could be confused, *only* `\b` has a meaning as both a character escape (backspace) and as a regular expression anchor (beginning of a word boundary). :-) – Martijn Pieters Feb 18 '13 at 10:24
  • @MartijnPieters: No, I am referring to how the ``\`` followed by a non-escape sequence is treated **in a string literal**. In JS (and C, IIRC), the ``\`` evaporates. Java gives compile error. Python is one that preserves the ``\`` if an escape sequence is not formed. – nhahtdh Feb 18 '13 at 10:28
  • @nhahtdh: ah, yes, I see what you mean there. Sorry for having misread that. – Martijn Pieters Feb 18 '13 at 10:29
  • Thank you guys for giving me recommendations! – bing Feb 19 '13 at 00:13

1 Answers1

43

re.match "matches" since the beginning of the string, but there is an extra 1.

Use re.search instead, which will "search" anywhere within the string. And, in your case, also find something:

>>> re.search(pattern,s).groups()
('89059809102', '30589533')

If you remove the brackets in pattern, it will still return a valid _sre.SRE_Match, object, but with empty groups:

>>> re.search('\s\d{11}/\d{8}',s).groups()
()
eumiro
  • 207,213
  • 34
  • 299
  • 261