3

I have following code which will store all the csv filename in a list from a specific folder

import pandas as pd
import re
import os

files = os.listdir('.')
filename=[filename for filename in files if filename.endswith('.csv')]

However, in my folder, I have two types of csv files, one ends with, for example, _20.cvs(or maybe _18.csv,_01.csv), another one ends with _Raw.csv;

However I only need the first type stored in my list. I know regular expression may can help me on that, so I did some google search, and come up with the following code, but it seems doesn't work, can anyone offer a advice?

filename = [re.search(r'^\d{2}.csv'),filename).group(0) for filename in files] 
Rowling
  • 213
  • 1
  • 8
  • 20

4 Answers4

7

You need to remove ^ (as it matches the start of string location), add $ at the end of the pattern (to make sure the match is at the end of the string) and escape the dot (else, . matches any char but a line break char).

Note you must check if there is a match before accessing .group():

result = [f for f in files if re.search(r'_\d{2}\.csv$', f)] 

Details

  • _ - an underscore
  • \d{2} - 2 digits
  • \. - a literal dot
  • csv - csv text
  • $ - end of string.

See the regex demo.

Python demo:

import re
files = ["gfrt_32_20.csv", "wertf_18.csv", "12_01.csv", "ith_Raw.csv"]
result = [f for f in files if re.search(r'_\d{2}\.csv$', f)] 
print(result)
# => ['gfrt_32_20.csv', 'wertf_18.csv', '12_01.csv']
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • @Borisu There is no need adding the details about `re.match` and `re.search` difference into my answer as OP problem is not related to it. [Here](https://stackoverflow.com/questions/48355460/python-difference-between-re-matchpattern-v-s-re-search-pattern) is a good thread on that. – Wiktor Stribiżew Nov 22 '18 at 11:48
3

re.match would not work because it matches at the beginning. Use re.search instead. But everything else is fine in the previous solution.

import os
import re
files = os.listdir('.')
filenames = [f for f in files if re.search(r'(_\d+.csv)', f)]
print(filenames)
AResem
  • 139
  • 5
1

Try to use re.match method:

import os
import re
files = os.listdir('.')
filenames = [f for f in files if re.match(r'(_\d+.csv)', f)]
print(filenames)
Rezvanov Maxim
  • 346
  • 1
  • 7
1

You should put the regex operation in the if clause so as to filter out those you don't want.

You should also escape the . in the regex, since dots have special meaning in regex (match all non-line terminators).

[filename for filename in files if re.search(r'\d{2}\.csv$', filename)]

If you want only the matched bit, you can do a simple substring:

[filename[-6:] for filename in files if re.search(r'\d{2}\.csv$', filename)]
Sweeper
  • 213,210
  • 22
  • 193
  • 313