Python RE to search digit along with decimal

Question

I am trying to pull the digit values (100.00 & 200.00) using pythons regular expressions , but when I invoke the code it doesn't yield anything... I am using python version 2.7

1) My file name is "file100" from where I need to opt the values..

# cat file100
Hi this doller 100.00
Hi this is doller 200.00

2) This is my python code..

# cat count100.py
#!/usr/bin/python
import re
file = open('file100', 'r')
for digit in file.readlines():
        myre=re.match('\s\d*\.\d{2}', digit)
        if myre:
           print myre.group(1)

3) While I am running this code , it does not yield anything , no error .. nothing ..

# python   count100.py

Side-note: `for digit in file.readlines():` is wasteful and delays processing (it slurps the whole file into memory before beginning iteration). `for digit in file:` iterates without slurping (so peak memory is based on the largest input line, not the size of the file). There is literally no use case for `file.readlines()`; in the rare case where you need a `list` of lines instead of iterating lines as you go, `list(file)` accomplishes the same result more generally/succinctly (it works on any non-infinite iterator, not just file-like objects with `.readlines()`). — ShadowRanger, Dec 21 '15 at 21:46
I don't know why you're using `group(1)`, you don't have a capture group in your regex. I believe I have concisely explained why you should only be using `group(0)` below. — Russia Must Remove Putin, Dec 21 '15 at 21:53

Juan Diego Godoy Robles · Accepted Answer · 2016-01-08T20:26:57.610

2

Use re.search instead:

import re
file = open('file.txt', 'r')
for digit in file.readlines():
    myre = re.search(r'\s\b(\d*\.\d{2})\b', digit)
    if myre:
        print myre.group(1)

Results

100.00
200.00

From the documentation:

Scan through string looking for the first location where the regular expression pattern produces a match

If you decided to use a group, parentheses are also needed:

(...) Matches whatever regular expression is inside the parentheses, and indicates the start and end of a group; the contents of a group can be retrieved after a match has been performed, and can be matched later in the string with the \number special sequence, described below. To match the literals '(' or ')', use ( or ), or enclose them inside a character class: [(] [)].

re.match is only valid:

If zero or more characters at the beginning of string match the regular expression pattern

r to enclose regex as raw strings:

String literals may optionally be prefixed with a letter 'r' or 'R'; such strings are called raw strings and use different rules for interpreting backslash escape sequences.

...

Unless an 'r' or 'R' prefix is present, escape sequences in strings are interpreted according to rules similar to those used by Standard C

edited Jan 08 '16 at 20:26

answered Dec 21 '15 at 20:47

Juan Diego Godoy Robles

14,447
2
38
52

It doesn't use the context manager, it materializes the entire file in memory, and it recompiles the regex every time in the loop. `digit` is also semantically wrong. – Russia Must Remove Putin Dec 21 '15 at 21:23
1

Where only solving the OP main problem, this is not a code revision tool , of course it can be improved in many ways but that's not the point here ... cheers – Juan Diego Godoy Robles Dec 21 '15 at 21:27
2

The question was why it didn't work and the answer given by @klashxx explains it why. If you want to improve the answer, you can give it as a separate answer, why downvote it? – helloV Dec 21 '15 at 21:27
@klashxx - This refine code working nicely, Though i am looking for the literal meaning of 'r' just before regex , Meanwhile i'll be reading it from python org. – Karn Kumar Dec 21 '15 at 21:32
1

`r` makes it a raw string, so your backslashes won't escape the characters they precede when Python parses the string. – Russia Must Remove Putin Dec 21 '15 at 21:37

Padraic Cunningham · Answer 2 · 2015-12-21T21:32:29.533

1

If they are always at the end of your lines just rsplit once and pull the last element:

with open('file100', 'r') as f:
    for line in f:
        print(line.rsplit(None, 1)[1])

Output:

100.00
200.00

rsplit(None,1) just means we split once from the end of the string on whitespace, then we pull the second element:

In [1]: s = "Hi this doller 100.00"

In [2]: s.rsplit(None,1)
Out[2]: ['Hi this doller', '100.00']

In [3]: s.rsplit(None,1)[1]
Out[3]: '100.00'

In [4]: s.rsplit(None,1)[0]
Out[4]: 'Hi this doller'

If you really need a regex use search:

import re

with open('file100', 'r') as f:
    for line in f:
        m = re.search(r"\b\d+\.\d{2}\b",line)
        if m:
            print(m.group())

edited Dec 21 '15 at 21:32

answered Dec 21 '15 at 20:50

Padraic Cunningham

176,452
29
245
321

You're assuming there's always a space before the digits. – Russia Must Remove Putin Dec 21 '15 at 21:29
@AaronHall, did you look at the OP's own pattern ? – Padraic Cunningham Dec 21 '15 at 21:29
@Padraic - its working but i am looking forward this to be accomplished with regular expr . can explain the below code.. `(None, 1)[1]` – Karn Kumar Dec 21 '15 at 21:29
@pygo, I added an example that hopefully makes it clear – Padraic Cunningham Dec 21 '15 at 21:32
1

@Padraic - that's pretty nice explaination. – Karn Kumar Dec 21 '15 at 21:46

Russia Must Remove Putin · Answer 3 · 2015-12-21T21:54:22.770

1

Your primary problem is that you're using re.match which requires a match starting from the beginning of the string, not re.search, which allows a match that can start at any point in the string. I'll break down my recommendations, though:

import re

No need to recompile on every loop (Python actually caches some regexes for you, but keep one in a reference to be safe). I'm using a VERBOSE flag to break apart the regex for you. Use a r to precede your string so that backslashes aren't escaping the characters they precede as Python reads the string:

regex = re.compile(r'''
  \s      # one whitespace character, though I think this is perhaps unnecessary
  \d*     # 0 or more digits
  \.      # a dot
  \d{2}   # 2 digits
  ''', re.VERBOSE)

Use a context manager and open the file with universal newlines, 'rU' mode, so that no matter what platform the file was created on, you will be able to read it line by line.

with open('file100', 'rU') as file:

Don't use readlines, which loads the entire file into memory at once. Instead, use the file object as an iterator:

    for line in file:
        myre = regex.search(line) 
        if myre:
            print(myre.group(0)) # access the first group, there are no  
                                 # capture groups in your regex

My code prints:

100.00
200.00

edited Dec 21 '15 at 21:54

answered Dec 21 '15 at 20:58

Russia Must Remove Putin

374,368
89
403
331

Briefly and nicely explained as well, I am newbie so still in learning process for python. – Karn Kumar Dec 21 '15 at 21:47
If you like it, you can now upvote it, and if it best answers your question, you can accept it as well. – Russia Must Remove Putin Dec 21 '15 at 21:49
@ Aaron - Why we are using `re.compile` , is it necessary or we can trim it. What is the use of it. I would also like to know "context manager" what you used in your earlier comment as well, Is this you talking about memory context? – Karn Kumar Dec 21 '15 at 22:34
@pygo regular expression strings have to be compiled before they can be used in a search. If you don't use a pre-compiled regex, then the semantics are that you are recompiling the search on every loop. It is a best practice to separate and life redundant code out of inner loops. It does not add complexity to your code, in fact it reduces it, in the context of a large program. – Russia Must Remove Putin Dec 21 '15 at 23:01

score -1 · Answer 4 · answered Dec 21 '15 at 21:01

There's a couple problems here:

.match only looks for matches at the beginning of a string -- see search() vs. match().
You're not using capture groups, so there's no reason why .group(1) of myre.group(1) would have any content

Here's an updated sample:

import re

file = """
Hi this doller 100.00
Hi this is doller 200.00
"""

for digit in file.splitlines():
    myre = re.search('\s\d*\.\d{2}', digit)
    if myre:
        print(myre.group(0))

Python RE to search digit along with decimal

4 Answers4