0

I have a directory with a large number of files. The file names are similar to the following: the(number)one(number), where (number) can be any number. There are also files with the name: the(number), where (number) can be any number. I was wondering how I can count the number of files with the additional "one(number)" at the end of their file name.

Let's say I have the list of file names, I was thinking of doing

for n in list:
    if n.startswith(the(number)one):
        add one to a counter

Is there anyway for it to accept any number in the (number) space when doing a startswith?

Example: the34one5 the37one2 the444one3 the87one8 the34 the32

This should return 4.

Takkun
  • 8,119
  • 13
  • 38
  • 41

3 Answers3

8

Use a regex matching 'one\d+' using the re module.

import re
for n in list:
    if re.search(r"one\d+", n):
        add one to a counter

If you want to make it very accurate, you can even do:

for n in list:
    if re.search(r"^the\d+one\d+$", n):
        add one to a counter

Which will even take care of any possible non digit chars between "the" and "one" and won't allow anything else before 'the' and after the last digit'.

You should start learning regexp now:

  • they let you make some complex text analysis in a blink that would be hard to code manually
  • they work almost the same from one language to another, making you more flexible
  • if you encounter some code using them, you will be puzzled if you didn't cause it's not something you can guess
  • the sooner you know them, the sooner you'll learn when NOT (hint) to use them. Which is eventually as important as knowing them.
Community
  • 1
  • 1
Bite code
  • 578,959
  • 113
  • 301
  • 329
  • I think the regex should be "one\d+$" because the OP specified that he want to match __"one(number)" at the end of their file name__ or maybe a complete regex "the\d+one\d+" and using `match()` instead of `search()`. – mouad Jun 10 '11 at 15:59
  • You are right, I added a second example with a more accurate matching. – Bite code Jun 10 '11 at 16:06
0

The easiest way to do this probably is glob.glob():

number = len(glob.glob("/path/to/files/the*one*"))

Note that * here will match any string, not just numbers.

Sven Marnach
  • 574,206
  • 118
  • 941
  • 841
  • 1
    This is clever, but it will fail if any char that is not a number is between "the" and "one" – Bite code Jun 10 '11 at 15:50
  • `glob.glob()` will match files in the current working directory, though. You porbably mean `fnmatch.fnmatch()`. – Thomas Wouters Jun 10 '11 at 15:51
  • @Thomas: No, I don't mean `fnmatch.fnmatch()`, because `glob.glob()` is much easier to use here. Thanks for pointing out the issue with the directory! – Sven Marnach Jun 10 '11 at 15:53
  • @e-satis: I'm puzzled how your answer will be any different in case there is a non-digit character between `the` and `one`. – Sven Marnach Jun 10 '11 at 15:56
0

The same as a one-liner and also answering the question as it should match 'the' as well:

import re
count = len([name for name in list if re.match('the\d+one', name)])
badzil
  • 3,440
  • 4
  • 19
  • 27