need help to extract character with python regex

Question

I am trying to extract the letters out of string, but I can't get it done correctly. Any help will be very much appreciated.

String: 20200702_abcd_ef_aed_usd_cdee_hgd.csv
Expected: abcd_ef_aed_usd_cdee_hgd
Actual: _abcd_ef_aed_usd_cdee_hgd

Here is the code, any help can remove the leading _?

import re
re.search('[A-Z_]+', "20200702_abcd_ef_aed_usd_cdee_hgd.csv").group()

@Skycc Your code absolute not working. It is just the way Jennifer Stone write in the question. — Xu Qiushi, Jul 29 '20 at 05:53

score 1 · Answer 1 · answered Jul 29 '20 at 05:50

1

Try:

re.search('[a-z][a-z_]+', "20200702_abcd_ef_aed_usd_cdee_hgd.csv").group()

answered Jul 29 '20 at 05:50

Xu Qiushi

1,111
1
5
10

Siddharth · Answer 2 · 2020-07-30T05:28:08.393

1

import re
x = "20200702_abcd_ef_aed_usd_cdee_hgd.csv"
y = re.search(r"[a-zA-Z][_A-Za-z]+",x)
print(y[0])

as you don't need any underscore at the starting of the string you have to specify i.e, [a-zA-Z] and then follows the rest [_a-zA-z]+, correction is add [a-zA-Z] at the starting.

edited Jul 30 '20 at 05:28

answered Jul 29 '20 at 05:53

Siddharth

56
5

score 0 · Answer 3 · answered Jul 29 '20 at 08:19

You do not actually have to rely on a regex here. You may get the file name without extension, and then split with _ into two parts and get the last one:

import os
s = "20200702_abcd_ef_aed_usd_cdee_hgd.csv"
print( os.path.splitext(s)[0].split('_', 1)[-1] )

With regex, you might also try a re.sub solution that will also make sure only the digits + _ are removed from the start:

re.sub(r'^\d+_|\.[^.]*$', '', s)

Or, if no digit checking is necessary:

re.sub(r'^[^_]+_|\.[^.]*$', '', s)

See a Python demo and the regex demo.

Details

^[^_]+_ - start of string, one or more chars other than _
| - or
\.[^.]*$ - ., any 0 or more chars other than . and then an end of string.

need help to extract character with python regex

3 Answers3