0

So lets say I have the following strings:

stringX = ['187-49481,14',
'181-457216',
'196,61-04-22',
'1972-10-28',
'19,940-04-16',
'2017-08,8-29',
'2014-04-18']

Notice that I have two types of strings: the type 181-457216 and the type 1972-10-28 (date) I'm modifying a CSV, and for some reason (looked it up hard, didn't find any reason), it sometimes -apparently randomly- inserts a comma between numbers in these types of strings.

So what I want to accomplish is to just detect these commas through regular expression and replace them by empty (remove the commas).

Say for the first type of string, i.e: '187-14,412' I've been trying:

re.sub(r'\d+\-\d+(\,)\d+', '', stringX)

In this example, the comma is group 1, but how can I specify to sub group(1) in this regex ?

I've also been trying lookbehind and lookahead, but have trouble with the lookbehind:

(?<=\d+\-\d+)(\,)(?=\d+)
Err: lookbehind assertion is not fixed length at offset 0

I was wandering if there is a better way to regex these strings, or to be able to specify group(1) on the re.sub

Sahira Mena
  • 411
  • 4
  • 14
  • 2
    My advice is to look harder at why your code modifying CSV files is generating errors (perhaps post a question here). Trying to work around bugs (particularly when they are in your own code) is a mug's game. To remove commas, however, why not just convert them to empty strings. To do that you shouldn't even need a regular expression. – Cary Swoveland Apr 19 '20 at 22:43
  • it seems that the code which modifies your CSV file besides commas also inserts an additional digit in data entries. 187-49481,14 should be 187-494814, 196,61-04-22 -> 1961-04-22 and so on. – anfauglit Apr 20 '20 at 04:05

4 Answers4

2

Solution

You could use a simple pythonic list-comprehension with str.replace().

[x.replace(',','') for x in stringX]

Output:

['187-4948114',
 '181-457216',
 '19661-04-22',
 '1972-10-28',
 '19940-04-16',
 '2017-088-29',
 '2014-04-18']

If you want to use regex, then this could be an alternative.

import re # regex library
re.sub(',','', '|'.join(stringX)).split('|')

Output:

['187-4948114',
 '181-457216',
 '19661-04-22',
 '1972-10-28',
 '19940-04-16',
 '2017-088-29',
 '2014-04-18']

Extracting Single-Dashed and Double-Dashed Values

You could extract the numbers with single and double dashes as follows using re.findall().

import re # regex library

text = [x.replace(',','') for x in stringX]
text = '\n'.join(text)
single_dash = re.findall('\d+-\d+', text)
double_dash = re.findall('\d+-\d+-\d+', text)
print(f'single dash: \n\n{single_dash}\n')
print(f'double dash: \n\n{double_dash}\n')

Output:

single dash: 

['187-4948114', '181-457216', '19661-04', '1972-10', '19940-04', '2017-088', '2014-04']

double dash: 

['19661-04-22', '1972-10-28', '19940-04-16', '2017-088-29', '2014-04-18']
CypherX
  • 7,019
  • 3
  • 25
  • 37
  • @SahiraMena Perhaps you also need to extract single and double-dashed values, if so, take a look at the last part of this solution. – CypherX Apr 20 '20 at 01:03
1

You don't need regex for it, you can just split string at ','. And if it yields an array with length of more than 1, chop last index of left string (at index 0) and first of right (at index 1). Oh may be you do need it, idk.

const p = '187-49481,14';
const regex = /\d,/;
console.log(p.replace(regex, ''));//result is 187-494814

This is done in JavaScript but should be as easy with Python match \d, and replace it with nothing. Easy peasy, I don't know Python that well but that, probably, would do it

re.sub(r'\d,', '', stringX)
Aleks
  • 894
  • 10
  • 14
1
import re
[re.sub(r'\,', '', x) for x in stringX]

['187-4948114', '181-457216', '19661-04-22', '1972-10-28', '19940-04-16', '2017-088-29', '2014-04-18']
inverzeio
  • 525
  • 2
  • 10
1

You can use your regex approach by using a lambda expression in re.sub

Change

re.sub(r'\d+\-\d+(\,)\d+', '', stringX)

To:

re.sub(r'\d+\-\d+(\,)\d+', lambda m: m.group(0).replace(',', ''), stringX)
DarrylG
  • 16,732
  • 2
  • 17
  • 23
  • All answers were great, but this did it and learned a nice functionality of re.sub . I ended up doing re.sub(r'\d+\-\d+(\,)\d+', lambda m: re.sub(r'\,','',m.group(0)), stringX) and made a double re.sub – Sahira Mena Apr 20 '20 at 00:20