2

I am trying to extract the letters out of string, but I can't get it done correctly. Any help will be very much appreciated.

String: 20200702_abcd_ef_aed_usd_cdee_hgd.csv
Expected: abcd_ef_aed_usd_cdee_hgd
Actual: _abcd_ef_aed_usd_cdee_hgd

Here is the code, any help can remove the leading _?

import re
re.search('[A-Z_]+', "20200702_abcd_ef_aed_usd_cdee_hgd.csv").group()
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563

3 Answers3

1

Try:

re.search('[a-z][a-z_]+', "20200702_abcd_ef_aed_usd_cdee_hgd.csv").group()
Xu Qiushi
  • 1,111
  • 1
  • 5
  • 10
1
import re
x = "20200702_abcd_ef_aed_usd_cdee_hgd.csv"
y = re.search(r"[a-zA-Z][_A-Za-z]+",x)
print(y[0])

as you don't need any underscore at the starting of the string you have to specify i.e, [a-zA-Z] and then follows the rest [_a-zA-z]+, correction is add [a-zA-Z] at the starting.

Siddharth
  • 56
  • 5
0

You do not actually have to rely on a regex here. You may get the file name without extension, and then split with _ into two parts and get the last one:

import os
s = "20200702_abcd_ef_aed_usd_cdee_hgd.csv"
print( os.path.splitext(s)[0].split('_', 1)[-1] )

With regex, you might also try a re.sub solution that will also make sure only the digits + _ are removed from the start:

re.sub(r'^\d+_|\.[^.]*$', '', s)

Or, if no digit checking is necessary:

re.sub(r'^[^_]+_|\.[^.]*$', '', s)

See a Python demo and the regex demo.

Details

  • ^[^_]+_ - start of string, one or more chars other than _
  • | - or
  • \.[^.]*$ - ., any 0 or more chars other than . and then an end of string.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563