Regex to remove commas before a number in python

Question

I'm working with a file using commas as a delimiter. However, it has a field, address in it where the address is of form x,y,z which causes a problem as each part of the address gets a new column entry. The address is immediately followed by member_no a 1 digit number like 2 etc. Col1 (Address), Col2(1 Digit number)

text = '52A, XYZ Street, ABC District, 2'

I basically want to remove all commas before that number from the address field.

The output should be like

52A XYZ Street ABC District, 2'

I tried

re.sub(r',', ' ', text)

but it's replacing all instances of commas.

Not 100% sure regex is the way to go, unless you can be sure that no street names start with a number (e.g. 5th Avenue). — CodeSurgeon, May 15 '18 at 12:00
All parts of the address are attached to an alphabet like 52A. There are no standalone numbers in the address space. — Rohit Girdhar, May 15 '18 at 12:05
Do you want to say you only want to remove commas before the first "standalone" number? Like `re.sub(r'^(.*?)(,\s*\d+\b)', lambda x: "{}{}".format(x.group(1).replace(',', ''), x.group(2)), s)`? — Wiktor Stribiżew, May 15 '18 at 12:08
@WiktorStribiżew Thanks for the response but unfortunately it's not working in my case. I'm still getting the commas before the standalone number. — Rohit Girdhar, May 15 '18 at 12:12
I am not quite sure about the `,` before that first comma. Is it always there, or can it be missing? Also, see https://ideone.com/4ai0dI — Wiktor Stribiżew, May 15 '18 at 12:12

heemayl · Accepted Answer · 2018-05-15T12:26:31.290

6

Use a zero-width negative lookahead to make sure the to be replaced substrings (commas here) are not followed by {space(s)}{digit} at the end:

,(?!\s+\d$)

Example:

In [227]: text = '52A, XYZ Street, ABC District, 2'

In [228]: re.sub(',(?!\s+\d$)', '', text)
Out[228]: '52A XYZ Street ABC District, 2'

Edit:

If you have more commas after the ,{space(s)}{digit} substring, and want to keep them all, leverage a negative lookbehind to make sure the commas are not preceded by {space}{digit<or>[A-Z]}:

(?<!\s[\dA-Z]),(?!\s+\d,?)

Example:

In [229]: text = '52A, XYZ Street, ABC District, 2, M, Brown'

In [230]: re.sub('(?<!\s[\dA-Z]),(?!\s+\d,?)', '', text)
Out[230]: '52A XYZ Street ABC District, 2, M, Brown'

In [231]: text = '52A, XYZ Street, ABC District, 2'

In [232]: re.sub('(?<!\s[\dA-Z]),(?!\s+\d,?)', '', text)
Out[232]: '52A XYZ Street ABC District, 2'

edited May 15 '18 at 12:26

answered May 15 '18 at 11:56

heemayl

39,294
7
70
76

1

What if I have something else after the digit. Like, say text = '52A, XYZ Street, ABC District, 2, M, Brown' ?? Appreciate your answer. – Rohit Girdhar May 15 '18 at 12:01
@RohitGirdhar Whats your desired output from that? – heemayl May 15 '18 at 12:04
It should be of the form: 52A XYZ Street ABC District, 2, M, Brown with commas staying intact after that single digit numbers but getting removed before the number – Rohit Girdhar May 15 '18 at 12:07
Accepted your Answer. WiktorStribiżew also had a nice approach. – Rohit Girdhar May 15 '18 at 16:32

score 2 · Answer 2 · answered May 15 '18 at 11:56

If at the end is just a single digit you could use this. Can adapt if after the last comma are multiple digits(number 3 should be incremented).

text = '52A, XYZ Street, ABC District, 2'
text = text[:-3].replace(",", "") + text[-3:]
print(text)

The output is

52A XYZ Street ABC District, 2

score 2 · Answer 3 · answered May 15 '18 at 11:57

2

No need for a regular expression. You can just look for the last occurence of , and use that, as in:

text[:text.rfind(',')].replace(',', '') + text[text.rfind(','):]

answered May 15 '18 at 11:57

ksbg

3,214
1
22
35

score 1 · Answer 4 · answered Jun 21 '21 at 08:48

1

This one is especially for currencies. It won't remove comma in dates and other places.

mystring="he has 1,00000,00 ruppees and lost 50,00,00,000,00,000,00 june 20, 1970 and 30/23/34 1, 2, 3"

print(re.sub(r'(?:(\d+?)),(\d+?)',r'\1\2',mystring))

answered Jun 21 '21 at 08:48

mannem srinivas

111
2
6

Regex to remove commas before a number in python

4 Answers4

Linked