-1

I'm bad at regular expression so was hoping to get some feedback on this particular regex expression.

I have a list of file name, obtained from the os library's listdir method. The filenames will start with text like "D#######". So there may be D0000001.txt, D0000000.svg, D0000003.stl, etc. If the list is something like:

['D0000001.txt', 'D0000001.xlsx', 'D0000002.txt', 'D0000002.svg', 'D0000003.stl', 'D0000003.doc']

I want to all strings in the list that begin with 'D0000002'.

Will the regex 'D0000002*\.[a-zA-Z]{3}' always ONLY return a match object for 'D0000002.txt' & 'D0000002.svg', and nothing else, or is it possible this pattern could match other values?

Note that there is no guarantee that the filename preceding the extension will only contain the "D" + 8 digits. So it is possible for filenames like "D1234567_someMoreText_20230720.abc" to exist. And if the pattern is: 'D1234567*\.[a-zA-Z]{3}' the file noted should result in a match.

BTW, the comparison logic iterates through the list of filenames, performing the re.search() on each string in the list. That iteration adds matching names to a "matching files" list for return.

Thanks!

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
MikeA
  • 65
  • 6
  • You just need `[f for f in filenames if f.startswith('D0000002')]` – Wiktor Stribiżew Jul 20 '23 at 14:17
  • Does this answer your question? [How to find the python list item that start with](https://stackoverflow.com/questions/44517191/how-to-find-the-python-list-item-that-start-with) or [Checking whether a string starts with XXXX](https://stackoverflow.com/questions/8802860/checking-whether-a-string-starts-with-xxxx)? – Wiktor Stribiżew Jul 20 '23 at 14:18
  • Or, if there MUST be `D` + any digit at the start: `[f for f in filenames if re.match(r'D\d', f)]`. If there must be 8 digits after `D` - `r'D\d{8}'` – Wiktor Stribiżew Jul 20 '23 at 14:19
  • I'm working with existing code that uses the os and re libraries and want to utilize these functions as opposed to creating new code. There is no guarantee that the filenames will always start with a D or only have 7 digits after the leading character. The filenames begin with an order number that is obtained from a SQL query, so the only thing that is known for sure is that the filename will start with whatever the order number is. I just need to find all files that start with the order number, add them to the return list, and ignore everything else. – MikeA Jul 20 '23 at 14:25
  • What is "order number"? Any letter and then 8 digits? `[f for f in filenames if re.match(r'[a-zA-Z]\d{8}(?!\d)', f)]` – Wiktor Stribiżew Jul 20 '23 at 14:31

1 Answers1

-1

In regex:

  • * matches the previous token for zero or more times.
    For example, a* matches '', 'a', 'aa' and so on.
  • . matches any one character.
    For example, . matches 'a', 'b', 'c', '1', 'Z', etc.

Thus, if you write D0000002\.[a-zA-Z]{3}, that matches:

  • String D0000002.
  • Any three alphabet (lowercase or uppercase)

However, this will match filenames like hi_D0000002.txt_hello.
To prevent this, you can add ^ and $ in the start and end of regex expression, which shows the start of the string and end of the string respectively.

In conclusion, ^D0000002\.[a-zA-Z]{3}$ should work.
It means that the entire filename is D0000002.(alphabet)(alphabet)(alphabet)

P.S.

re.match function will check for entire match of the string, while re.search function will check for a match of part of the string.

So, you may want to write
re.match('D0000002\.[a-zA-Z]{3}', filename)
instead of
re.search('^D0000002\.[a-zA-Z]{3}$', filename)

Cythonista
  • 91
  • 7
  • OP needs "*to find all files that start with the order number, add them to the return list, and ignore everything else*". `D0000002` is just one example of the order number. – Wiktor Stribiżew Jul 20 '23 at 14:34