Python Regex - Find numbers with comma in string

Question

I might have a string:

"Foo Bar, FooTown, $100,00"

Now I need to split that string by a comma, but that would split it wrongly, as the $100,00 contains a comma too.

So I first want to look in the string, are there any numbers with a comma, and if so, replace the comma with a fullstop. so it looks like:

"Foo Bar, FooTown, $100.00"

EDIT:

It will always be small numbers, there won't be more than one , or ., but it might be $1 $10 $100 $1000

The string might or might not have spaces before the ,

This is a SMS message.

What happens if the period is already being used as a thousands separator, such as 1.000.000,00 (as seen in European locales)? — Bobulous, Aug 08 '12 at 18:00
Where do those strings come from? I'd first try to fix *that*, before trying to parse strings in a broken format. — Sven Marnach, Aug 08 '12 at 18:01
No that wont happen in my case. It will be small numbers like this — Harry, Aug 08 '12 at 18:01
Are all commas not appearing in numbers followed by a space? If so, you can splut at `", "`. — Sven Marnach, Aug 08 '12 at 18:02

score 8 · Answer 1 · answered Aug 08 '12 at 18:05

8

You can use

>>> re.sub(r"(\d),(\d)", r"\1.\2", "Foo Bar, FooTown, $100,00")
'Foo Bar, FooTown, $100.00'

answered Aug 08 '12 at 18:05

Sven Marnach

574,206
118
941
841

Savir · Answer 2 · 2012-08-08T18:39:38.890

You can also use negative lookaheads... those big forgottens in the super-powerful Python regular expression mechanisms...

You can make a regular expression to split by commas who are not preceded by a digit or followed by a digit.

#!/usr/bin/env python

import re
samples=[
    "Foo Bar, FooTown, $100,00",
    "$100,00, Foo Bar, FooTown",
    "Foo Bar, $100,00, FooTown",
    "$100,00, Foo Bar, FooTown,",
]

myRegex=re.compile(",(?!\d)|(?<!\d),")

for sample in samples:
    print "%s sample splitted: %s (%s items)" % (sample, myRegex.split(sample), len(myRegex.split(sample)))

Outputs:

Foo Bar, FooTown, $100,00 sample splitted: ['Foo Bar', ' FooTown', ' $100,00'] (3 items)
$100,00, Foo Bar, FooTown sample splitted: ['$100,00', ' Foo Bar', ' FooTown'] (3 items)
Foo Bar, $100,00, FooTown sample splitted: ['Foo Bar', ' $100,00', ' FooTown'] (3 items)
$100,00, Foo Bar, FooTown, sample splitted: ['$100,00', ' Foo Bar', ' FooTown', ''] (4 items)

I feel very sorry for the guys who developed the re module in Python... I've seen these kind of lookaheads very scarcely used.

score 1 · Answer 3 · answered Aug 08 '12 at 18:04

1

A RegEx replace of the pattern (\d),(\d) with \1.\2 will work. The \d matches any digit, and the parentheses around it means that the number will be remembered and \1 will match the first one and \2 will match the second one.

answered Aug 08 '12 at 18:04

KRyan

7,308
2
40
68

1

Wrong language :) It's `\1` in Python, not `$1`, and you will have to use raw strings. – Sven Marnach Aug 08 '12 at 18:05
@SvenMarnach: Ah, yeah, I know RegEx, not Python (well, not particularly well). Fixing, though your answer's always gonna be better. – KRyan Aug 08 '12 at 18:06
This answer explains what his happening, so the answers complement each other nicely. :) – Sven Marnach Aug 08 '12 at 18:09

Steven Rumbalski · Answer 4 · 2012-08-08T18:52:34.637

1

Rather than fixing your data, why not fix your split?

>>> import re
>>> s = "Foo Bar, FooTown, $100,00"
>>> re.split(r'(?<!\d),|,(?!\d)', s)
['Foo Bar', ' FooTown', ' $100,00']

This uses negative lookahead and lookbehind assertions to make sure that the comma is not surrounded by digits.

edit: Changed the regular expression from r'(?<!\d),(?!\d)' to r'(?<!\d),|,(?!\d)' to properly handle strings like "$100,00, Foo Bar, FooTown". Thanks to BorrajaX for pointing out my error in the comments.

edited Aug 08 '12 at 18:52

answered Aug 08 '12 at 18:17

Steven Rumbalski

44,786
9
89
119

I also thought that, but I think that it won't split correctly strings like "$100,00, Foo Bar, FooTown", where the comma to split with is right after a digit, is that right? – Savir Aug 08 '12 at 18:32
@Borrajax: Excellent point. I changed the regular expression to deal with that. – Steven Rumbalski Aug 08 '12 at 18:53
Me too. I had a simmilar idea to yours **:-D** I like to see that someone is using the lookaheads feature... I haven't seen it used much... But people don't seem to like my anwer **:_(** – Savir Aug 08 '12 at 19:02

Python Regex - Find numbers with comma in string

EDIT:

4 Answers4

Linked