1

I might have a string:

"Foo Bar, FooTown, $100,00" 

Now I need to split that string by a comma, but that would split it wrongly, as the $100,00 contains a comma too.

So I first want to look in the string, are there any numbers with a comma, and if so, replace the comma with a fullstop. so it looks like:

"Foo Bar, FooTown, $100.00"

EDIT:

It will always be small numbers, there won't be more than one , or ., but it might be $1 $10 $100 $1000

The string might or might not have spaces before the ,

This is a SMS message.

Santosh Kumar
  • 26,475
  • 20
  • 67
  • 118
Harry
  • 13,091
  • 29
  • 107
  • 167

4 Answers4

8

You can use

>>> re.sub(r"(\d),(\d)", r"\1.\2", "Foo Bar, FooTown, $100,00")
'Foo Bar, FooTown, $100.00'
Sven Marnach
  • 574,206
  • 118
  • 941
  • 841
2

You can also use negative lookaheads... those big forgottens in the super-powerful Python regular expression mechanisms...

You can make a regular expression to split by commas who are not preceded by a digit or followed by a digit.

#!/usr/bin/env python

import re
samples=[
    "Foo Bar, FooTown, $100,00",
    "$100,00, Foo Bar, FooTown",
    "Foo Bar, $100,00, FooTown",
    "$100,00, Foo Bar, FooTown,",
]

myRegex=re.compile(",(?!\d)|(?<!\d),")

for sample in samples:
    print "%s sample splitted: %s (%s items)" % (sample, myRegex.split(sample), len(myRegex.split(sample)))

Outputs:

Foo Bar, FooTown, $100,00 sample splitted: ['Foo Bar', ' FooTown', ' $100,00'] (3 items)
$100,00, Foo Bar, FooTown sample splitted: ['$100,00', ' Foo Bar', ' FooTown'] (3 items)
Foo Bar, $100,00, FooTown sample splitted: ['Foo Bar', ' $100,00', ' FooTown'] (3 items)
$100,00, Foo Bar, FooTown, sample splitted: ['$100,00', ' Foo Bar', ' FooTown', ''] (4 items)

I feel very sorry for the guys who developed the re module in Python... I've seen these kind of lookaheads very scarcely used.

Savir
  • 17,568
  • 15
  • 82
  • 136
1

A RegEx replace of the pattern (\d),(\d) with \1.\2 will work. The \d matches any digit, and the parentheses around it means that the number will be remembered and \1 will match the first one and \2 will match the second one.

KRyan
  • 7,308
  • 2
  • 40
  • 68
1

Rather than fixing your data, why not fix your split?

>>> import re
>>> s = "Foo Bar, FooTown, $100,00"
>>> re.split(r'(?<!\d),|,(?!\d)', s)
['Foo Bar', ' FooTown', ' $100,00']

This uses negative lookahead and lookbehind assertions to make sure that the comma is not surrounded by digits.

edit: Changed the regular expression from r'(?<!\d),(?!\d)' to r'(?<!\d),|,(?!\d)' to properly handle strings like "$100,00, Foo Bar, FooTown". Thanks to BorrajaX for pointing out my error in the comments.

Steven Rumbalski
  • 44,786
  • 9
  • 89
  • 119
  • I also thought that, but I think that it won't split correctly strings like "$100,00, Foo Bar, FooTown", where the comma to split with is right after a digit, is that right? – Savir Aug 08 '12 at 18:32
  • @Borrajax: Excellent point. I changed the regular expression to deal with that. – Steven Rumbalski Aug 08 '12 at 18:53
  • Me too. I had a simmilar idea to yours **:-D** I like to see that someone is using the lookaheads feature... I haven't seen it used much... But people don't seem to like my anwer **:_(** – Savir Aug 08 '12 at 19:02