2

I'm fairly new to python scripting and I want to verify file names in a directory and subdirectory. The verification should be case sensitive. I'm using python 2.6.5 OS: win7 and xp

I prompt for the following user input:

prompt = "year"
year = raw_input(prompt)
prompt = "number"
number = raw_input(prompt)

From here I want to search/verify that the following files and folders exist and their filename is correct.

folderstructure:

..\foobar_(number)_version1\music

Files in subfolder 'music'

(year)_foobar_(number)_isnice.txt
(year)_itis(number)hot_today.txt
(year)_anything_is(number)possible.txt
(year)_something_{idont_want_to_check_this_part}_(number)_canbe_anything.txt

Note that all text including underscores are always the same, and thus should always be right, except for the things between () or {}. I want to output the results to a txt file which reports if the filename is correct or not.

What is the most logical method to archieve this? I've read the lib documentation fnmatch(.fnmatchcase), RE and os(.path.isfile) and searched here for examples, but I just can't figure out where and how to start.

Can anyone point me in the right direction?

[edit] As soon as my script has the working basics I'll post my code for reference or to help others.

[edit2] my first non-hello world script

import os
import re

#output :
file_out = "H:\\output.txt"
f_out = open(file_out, 'w')

print "-------start-script----------"

#input
prompt = "enter 4 digit year: "
year = raw_input(prompt)
prompt = "enter 2 digit number: "
number = raw_input(prompt)

print "the chosen year is %s" % (year)
print "the chosen number is %s" % (number)

f_out.write ("start log!\n")
f_out.write ("------------------------------------------\n")
f_out.write ("the chosen year is %s\n" % (year))
f_out.write ("the chosen number is %s\n" % (number))

#part i'm working on

print "end script"
f_out.write ("------------------------------------------\n")
f_out.write ("end script\n")

#close file
f_out.close()
supersoaker2000
  • 333
  • 2
  • 9

3 Answers3

2

Take a look at the glob module - this will help you get a list of files in the current directory:

import glob

year = raw_input('Year: ')        # Example: Year: 2009
number = raw_input('Number: ')    # Example: Number: 12
filenames = glob.glob('{year}_*{number}*'.format(year=year, number=number))

Filenames will be anything in the current directory that meets the following criteria:

  1. Begins with 2009_
  2. Any number of characters until it matches 12
  3. Any number of characters following 12.

os.path.exists is a good way to check if the file exists, or os.path.isfile if you want to make sure that it's really a file and not a directory named like a file. For Python3, check these docs, and like the link ghostbust555 mentioned says, be careful of race conditions if you plan on doing anything besides verifying their existence.


Based on your comment, it looks like this is a job for regular expressions. The pseudo code for what you need to write looks something like this:

for filename in list of filenames:
    if filename is not valid:
        print "<filename> is not valid!"

Aside from the actual pattern, the actual python code could look like this:

import os
import re

pattern = 'Put your actual pattern here'

# For a different directory, change the . to whatever the directory should be
for filename in os.listdir('.'):
    if not re.match(pattern, filename):
        print("Bad filename: ", filename)
Community
  • 1
  • 1
Wayne Werner
  • 49,299
  • 29
  • 200
  • 290
  • Hi, thanks for your answer. But i want to make sure everything between year and number is always right. for example, in the file "(year)_foobar_(number)_isnice.txt" the parts "_foobar_" and "_isnice.txt" should be checked as well. So If I have a file like "(year)_foobar_(number)_isbad.txt" it should report that it's not correct (because it doesn't meet the required part "_isnice.txt" hopefully I explained it right, english is not my primary language. – supersoaker2000 Jul 25 '12 at 15:22
  • Hi, can you post an example of using any of the two raw_inputs (year, number) in the pattern string? I searched for examples, but couldn't find out or get anything working. Do I need the re.compile / re.group part for this? – supersoaker2000 Jul 26 '12 at 15:01
  • @Ruud, `re.compile` just makes the regex run faster. If you're doing this for a large number (>1000?) then you can experiment with using re.compile for the pattern. I'd run it normally first, if it seems slow you can try to optimize. `re.group` just shows different parts of the match - in this case you'll only care that your entire pattern matches. I just modified my first example to use `raw_input` to get the year/number. – Wayne Werner Jul 27 '12 at 13:11
  • So if i'm right you first have to convert the input to a string you can use, like in the part ".format(year=year, number=number))"? Thanks will try it out, pretty much beginning to understand it now, although I still need to learn a lot :) – supersoaker2000 Jul 27 '12 at 14:19
  • Yeah. A regex pattern is just a string, so if you want to add your own (possibly changing) values in there you need to concatenate or format your string. It depends on use case of course, but for values that rarely change, I'd use a "constant" (i.e. `YEAR=2009`, `NUMBER=13`) and just change it on each run. For values that will change multiple times (running against several year/number pairs at a time), I'd probably have an input file and read the values from that. But if you've got just a few values or you want to re-run trying different values, I'd go the interactive route. – Wayne Werner Jul 27 '12 at 14:39
  • Yes, a Dict (like in the comment below) or any other form of input file works better and is more friendlier if you need to change things. – supersoaker2000 Jul 27 '12 at 15:00
0

This is not meant to be a full answer, but an extension of @Wayne Werner's answer. I don't have enough reputation points yet to comment. ;0

Wayne's approach using format I think is pointing to what you should do because it's validating filename BEFORE the files are built instead of after. And it seems that's what you're doing and have control over?

  1. I would do as much validation at the user input level as possible.
  2. Validate the other parts from whereever you get them.
  3. Build a dictionary with the parts.
  4. Build your file_name.

For example, at the user input level, something like:

yourDict = dict() 

year_input = raw_input('What is the year'?)

if not year_input.isdigit():  
    year_input = raw_input('Only digits please in the format YYYY, example: 2012'):

yourDict[year] = year_input

Then continute to add key:values to yourDict by validating the other values by whatever criteria it is you have. (Use re module or other method's mentioned).

Then, as Wayne was doing, use .format() with a passed in dictionary to map to the correct parts.

format1 = "{year}{part1}{number}{part2}.txt".format(**yourDict)

The approach also allows you to quickly build new formats with the same parts, and you can pick and choose what keys in the dictionary you need or don't need for each format.

Hope that's helpful.

wintour
  • 2,295
  • 2
  • 14
  • 10
-1
import os.path

year = 2009
file1 = year + "_foobar_" + number + "_isnice.txt"

os.path.exists(file1)   
ghostbust555
  • 2,040
  • 16
  • 29
  • IOError can fly for a number of reasons besides the file not existing. – Kos Jul 25 '12 at 15:12
  • Though the caveat is that this will only work correctly if the person running the script has read access to the files. – Wayne Werner Jul 25 '12 at 15:13
  • @Kos that is true but according to this thread - http://stackoverflow.com/questions/82831/how-do-i-check-if-a-file-exists-using-python os.path.exists() can lead to a potential security vulnerabilities – ghostbust555 Jul 25 '12 at 15:17
  • Yeah, I've seen that and it' really misleading. This statement doesn't introduce any danger by itself. A security vulnerability comes where you check once and keep assuming that it still exists / doesn't exist later on. – Kos Jul 25 '12 at 15:19
  • That might be better mentioned as a comment as the OP doesn't specifically mention anything other than verification that the file exists and the name is correct. – Wayne Werner Jul 25 '12 at 15:21
  • Hi, I only want to see if the file exist and it's name is correct. There will be always read access to the files. – supersoaker2000 Jul 25 '12 at 15:29
  • OK I changed it to use os.path.exists – ghostbust555 Jul 25 '12 at 15:33