3

I am trying to find files in directories where the file name used is sometimes only a part of the full file name.

So

check_meta=os.listdir(currentPath)

gives

['ANZMeta.xsl', 'Benefited_Areas', 'divisons', 'emergency', 'Error_LOG.txt', 'hex.dbf', 'hex.shp', 'hex.shp_BaseMetadata.xml', 'hex.shx', 'Maintenance_Areas', 'Rates.mxd', 'Regulated_Parking', 'schema.ini', 'Service_Areas', 'Shortcut to Local_Govt.lnk', 'TAB', 'TRC.rar', 'trc_boundary.dbf', 'trc_boundary.kml', 'trc_boundary.prj', 'trc_boundary.sbn', 'trc_boundary.sbx', 'trc_boundary.shp', 'trc_boundary.shp.ATGIS29.1772.3444.sr.lock', 'trc_boundary.shp.ATGIS30.2668.2356.sr.lock', 'trc_boundary.shp.xml', 'trc_boundary.shx', 'trc_boundary_Metadata.xml.auto', 'trc_boundary_Polygon.dbf', 'trc_boundary_Polygon.prj', 'trc_boundary_Polygon.sbn', 'trc_boundary_Polygon.sbx', 'trc_boundary_Polygon.shp', 'trc_boundary_Polygon.shp.ATGIS29.1772.3444.sr.lock', 'trc_boundary_Polygon.shx', 'trc_boundary_polygon.xml', 'Urbanlevy_bdy_region.dbf', 'Urbanlevy_bdy_region.prj', 'Urbanlevy_bdy_region.shp', 'Urbanlevy_bdy_region.shp.xml', 'Urbanlevy_bdy_region.shx', 'Urbanlevy_bdy_trc.dbf', 'Urbanlevy_bdy_trc. prj', 'Urbanlevy_bdy_trc.sbn', 'Urbanlevy_bdy_trc.sbx', 'Urbanlevy_bdy_trc.shp', 'Urbanlevy_bdy_trc.shp.xml', 'Urbanlevy_bdy_trc.shx']

I want to

existingXML=FileNm[:FileNm.find('.')]
if  existingXML+"*"+'.xml' in check_meta: # this is where the issue is
   print "exists"

so sometimes the xml to use is Urbanlevy_bdy_trc.shp.xml and at others it is Urbanlevy_bdy_trc.xml (whichever exists -note it is not to simply use a OR function for ".shp.xml" as there are multiple file extentions like tab, ecw etc that the datasets will have). Also sometimes the related xml file maybe called Urbanlevy_bdy_trc_Metadata.shp.xml so the key is just to search for the core file name "Urbanlevy_bdy_trc" with extension .xml

How can I specify this? the purpose of this is mentioned in Search and replace multiple lines in xml/text files using python

FULL CODE

import os, xml, arcpy, shutil, datetime
from xml.etree import ElementTree as et 

path=os.getcwd()
RootDirectory=path
arcpy.env.workspace = path
Count=0

Generated_XMLs=RootDirectory+'\GeneratedXML_LOG.txt'
f = open(Generated_XMLs, 'a')
f.write("Log of Metadata Creation Process - Update: "+str(datetime.datetime.now())+"\n")
f.close()

for root, dirs, files in os.walk(RootDirectory, topdown=False):
    #print root, dirs
    for directory in dirs:
        currentPath=os.path.join(root,directory)
        os.chdir(currentPath)
        arcpy.env.workspace = currentPath
        print currentPath
#def Create_xml(currentPath):

        FileList = arcpy.ListFeatureClasses()
        zone="_Zone"

        for File in FileList:
            Count+=1
            FileDesc_obj = arcpy.Describe(File)
            FileNm=FileDesc_obj.file
            print FileNm

            check_meta=os.listdir(currentPath)
            existingXML=FileNm[:FileNm.find('.')]
            print "XML: "+existingXML
            print check_meta
            #if  existingXML+'.xml' in check_meta:
            if any(f.startswith(existingXML) and f.endswith('.xml') for f in check_meta):
                print "exists"
                newMetaFile=FileNm+"_2012Metadata.xml"
                shutil.copy2(FileNm+'.xml', newMetaFile)
            else:
                print "Does not exist"
                newMetaFile=FileNm+"_BaseMetadata.xml"
                shutil.copy2('L:\Data_Admin\QA\Metadata_python_toolset\Master_Metadata.xml', newMetaFile)
            tree=et.parse(newMetaFile)

            print "Processing: "+str(File)

            for node in tree.findall('.//title'):
                node.text = str(FileNm)
            for node in tree.findall('.//northbc'):
                node.text = str(FileDesc_obj.extent.YMax)
            for node in tree.findall('.//southbc'):
                node.text = str(FileDesc_obj.extent.YMin)
            for node in tree.findall('.//westbc'):
                node.text = str(FileDesc_obj.extent.XMin)
            for node in tree.findall('.//eastbc'):
                node.text = str(FileDesc_obj.extent.XMax)        
            for node in tree.findall('.//native/nondig/formname'):
                node.text = str(os.getcwd()+"\\"+File)
            for node in tree.findall('.//native/digform/formname'):
                node.text = str(FileDesc_obj.featureType)
            for node in tree.findall('.//avlform/nondig/formname'):
                node.text = str(FileDesc_obj.extension)
            for node in tree.findall('.//avlform/digform/formname'):
                node.text = str(float(os.path.getsize(File))/int(1024))+" KB"
            for node in tree.findall('.//theme'):
                node.text = str(FileDesc_obj.spatialReference.name +" ; EPSG: "+str(FileDesc_obj.spatialReference.factoryCode))
            print node.text
            projection_info=[]
            Zone=FileDesc_obj.spatialReference.name

            if "GCS" in str(FileDesc_obj.spatialReference.name):
                projection_info=[FileDesc_obj.spatialReference.GCSName, FileDesc_obj.spatialReference.angularUnitName, FileDesc_obj.spatialReference.datumName, FileDesc_obj.spatialReference.spheroidName]
                print "Geographic Coordinate system"
            else:
                projection_info=[FileDesc_obj.spatialReference.datumName, FileDesc_obj.spatialReference.spheroidName, FileDesc_obj.spatialReference.angularUnitName, Zone[Zone.rfind(zone)-3:]]
                print "Projected Coordinate system"
            x=0
            for node in tree.findall('.//spdom'):
                for node2 in node.findall('.//keyword'):
                    print node2.text
                    node2.text = str(projection_info[x])
                    print node2.text
                    x=x+1


            tree.write(newMetaFile)

            f = open(Generated_XMLs, 'a')
            f.write(str(Count)+": "+File+"; "+newMetaFile+"; "+currentPath+"\n")
            f.close()



    #        Create_xml(currentPath)

RESULT

Community
  • 1
  • 1
GeorgeC
  • 956
  • 5
  • 16
  • 40
  • 2
    You should check out the `glob` module. Also why not use `os.path.splitext()` to get the extension – John La Rooy Jan 31 '12 at 06:03
  • @gnibbler you should write this into an answer; it'd be much better to take this approach than to try to make the original approach work. – Karl Knechtel Jan 31 '12 at 07:56

3 Answers3

2

why not just use:

searchtext = "sometext"
matching = [ f for f in os.listdir(currentPath) if f.startswith(searchtext) and ".xml" in f]

If you want to check for different extentions you can list them out.

exts = (".xml", ".tab", ".shp")
matching = [ f for f in os.listdir(currentPath) if f.startswith(searchtext) and os.path.splitext(f)[-1] in exts]

Of course you could figure out the regex to do the same thing as well.

monkut
  • 42,176
  • 24
  • 124
  • 155
0

Try the following:

if any(f.startswith(existingXML) and f.endswith('.xml') for f in check_meta):
   print "exists"

The any() built-in function takes an iterable as an argument and returns true if any of the elements are true. The argument that we pass is a generator expression which will yield the value f.startswith(existingXML) and f.endswith('.xml') for each file f in your list check_meta.

A regex solution might look something like this:

regex = re.compile(re.escape(existingXML) + '.*\.xml$')
if any(regex.match(f) for f in check_meta):
    print "exists"

If you need to know which entry actually matches, use a for loop instead:

for f in check_meta:
    if f.startswith(existingXML) and f.endswith('.xml'):
        print "exists, file name:", f
Andrew Clark
  • 202,379
  • 35
  • 273
  • 306
  • thanks...on the if any solution I get an IOError: [Errno 2] No such file or directory: u'Asc_Sewer_Catchments.xml' where >>> existingXML u'Asc_Sewer_Catchments' --- The file that exists there is Asc_Sewer_Catchments.shp.xml so – GeorgeC Jan 31 '12 at 05:02
  • @GeorgeC - That is odd, there shouldn't be anything there that would raise an IOError, could you edit your question to show the code that is causing the error and the full exception text? – Andrew Clark Jan 31 '12 at 05:05
  • The issue is not with your code but with how the next lines deal with what the xml will be called. I am trying to figure it out...can we get your 'if any' statement to spit out the name of the xml file that it finds? – GeorgeC Jan 31 '12 at 05:38
  • For that, you would need to use a for loop rather than 'if any', I have edited my answer with an example of how to do that. – Andrew Clark Jan 31 '12 at 05:57
0
import fnmatch, posixpath

existingXML   = posixpath.splitext(FileNm)[-1]
matchingFiles = fnmatch.filter(check_meta, existingXML + "*" + ".xml")

if not matchingFiles:
    raise IOError("no matching XML files")
elif len(matchingFiles) > 1:
    print "more than one matching file, using first"
    matchingFile = matchingFiles[0]
else:   # only one was found, just use it
    matchingFile = matchingFiles[0]
kindall
  • 178,883
  • 35
  • 278
  • 309