1

I'm working with a file using commas as a delimiter. However, it has a field, address in it where the address is of form x,y,z which causes a problem as each part of the address gets a new column entry. The address is immediately followed by member_no a 1 digit number like 2 etc. Col1 (Address), Col2(1 Digit number)

text = '52A, XYZ Street, ABC District, 2'

I basically want to remove all commas before that number from the address field.

The output should be like

52A XYZ Street ABC District, 2'

I tried

re.sub(r',', ' ', text)

but it's replacing all instances of commas.

llllllllll
  • 16,169
  • 4
  • 31
  • 54
Rohit Girdhar
  • 353
  • 3
  • 8
  • 26

4 Answers4

6

Use a zero-width negative lookahead to make sure the to be replaced substrings (commas here) are not followed by {space(s)}{digit} at the end:

,(?!\s+\d$)

Example:

In [227]: text = '52A, XYZ Street, ABC District, 2'

In [228]: re.sub(',(?!\s+\d$)', '', text)
Out[228]: '52A XYZ Street ABC District, 2'

Edit:

If you have more commas after the ,{space(s)}{digit} substring, and want to keep them all, leverage a negative lookbehind to make sure the commas are not preceded by {space}{digit<or>[A-Z]}:

(?<!\s[\dA-Z]),(?!\s+\d,?)

Example:

In [229]: text = '52A, XYZ Street, ABC District, 2, M, Brown'

In [230]: re.sub('(?<!\s[\dA-Z]),(?!\s+\d,?)', '', text)
Out[230]: '52A XYZ Street ABC District, 2, M, Brown'

In [231]: text = '52A, XYZ Street, ABC District, 2'

In [232]: re.sub('(?<!\s[\dA-Z]),(?!\s+\d,?)', '', text)
Out[232]: '52A XYZ Street ABC District, 2'
heemayl
  • 39,294
  • 7
  • 70
  • 76
2

If at the end is just a single digit you could use this. Can adapt if after the last comma are multiple digits(number 3 should be incremented).

text = '52A, XYZ Street, ABC District, 2'
text = text[:-3].replace(",", "") + text[-3:]
print(text)

The output is

52A XYZ Street ABC District, 2
Silviu
  • 79
  • 9
2

No need for a regular expression. You can just look for the last occurence of , and use that, as in:

text[:text.rfind(',')].replace(',', '') + text[text.rfind(','):]
ksbg
  • 3,214
  • 1
  • 22
  • 35
1

This one is especially for currencies. It won't remove comma in dates and other places.

mystring="he has 1,00000,00 ruppees and lost 50,00,00,000,00,000,00 june 20, 1970 and 30/23/34 1, 2, 3"

print(re.sub(r'(?:(\d+?)),(\d+?)',r'\1\2',mystring))
mannem srinivas
  • 111
  • 2
  • 6