-1

I have a string like this:

string = 'attachment; filename="This-is-my-file-2019-10-01.csv"'

I want to only extract date information "2019-10-01" (the same format)

I used:

re.match('^[ 0-9]+$', string)

and

re.match(r'^([\s\d]+)$', string)

and

re.findall(r'\d', string)

Yet the first two can't even get any digit..I wonder why.. and the the output for the last one is ['2', '0', '1', '9', '1', '0', '0', '1']. I wonder if there's any way that the date information can be extracted directly? Thank you!

Chen
  • 81
  • 5

3 Answers3

0

The ^ and $ in the regexp match the beginning and end of the string, respectively, so when you try to match ^[ 0-9]$, it will only match a string in which every character is a digit or space.

If you only want to match dates in that exact format, you can use [0-9]{4}-[0-9]{2}-[0-9]{2}.

You could use \d instead of [0-9], but that will also match other unicode digits, such as "" and "๑".

If you want to match other formats or want validation that the date is correct (such as rejecting "9999-99-99," take a look at this answer.

jirassimok
  • 3,850
  • 2
  • 14
  • 23
0

first don't use 'string' as variable as it is a built in lib of python

st = 'attachment; filename="This-is-my-file-2019-10-01.csv"' 
v_date = '-'.join((i if '.' not in i else i.split('.')[0]  for i in st.split('-')[-3:]))
print(v_date)

or

import re

st = 'attachment; filename="This-is-my-file-2019-10-01.csv"' 
v_date = re.findall(r'(\d{4}-\d{2}-\d{2})', st)[0]
print(v_date)
  • 1
    the built in of string in python is str not string – Chris Doyle Oct 09 '19 at 18:18
  • if you do a "import string" what do you get? – Narcisse Doudieu Siewe Oct 09 '19 at 18:19
  • yeah but thats not a built in. thats like saying dont call a variable turtle cause there is a package called turtle. If you dont use that package then its not an issue. – Chris Doyle Oct 09 '19 at 18:22
  • yeah but you don't if this package was used before...this can be just a little part of his code. – Narcisse Doudieu Siewe Oct 09 '19 at 18:25
  • Given he defined this variable in his code then i think its fair to say he isnt using the package string. I get your point cause i see it all the time with people calling there list as just list and then not understanding why they cant make new lists. But in this case its specific only if they imported the string library – Chris Doyle Oct 09 '19 at 18:33
0

your regex in the first 2 looks only at the start of the line for digit or space. in your last one you say find all digits, so you get a list of digits. your always better to make a regex as specific as you can to match what you want

import re
string = 'attachment; filename="This-is-my-file-2019-10-01.csv"'
match = re.findall(r'(\d{4}-\d\d-\d\d)', string)
print(match[0])
Chris Doyle
  • 10,703
  • 2
  • 23
  • 42