-1

So I have a rather general question I was hoping to get some help with. I put together a Python program that runs through and automates workflows at the state level for all the different counties. The entire program was created for research at school - not actual state work. Anyways, I have two designs shown below. The first is an updated version. It takes about 40 minutes to run. The second design shows the original work. Note that it is not a well structured design. However, it takes about five minutes to run the entire program. Could anybody give any insight why there are such differences between the two? The updated version is still ideal as it is much more reusable (can run and grab any dataset in the url) and easy to understand. Furthermore, 40 minutes to get about a hundred workflows completed is still a plus. Also, this is still a work in progress. A couple minor issues still need to be addressed in the code but it is still a pretty cool program.

Updated Design

import os, sys, urllib2, urllib, zipfile, arcpy
from arcpy import env

path = os.getcwd()

def pickData():
    myCount = 1
    path1 = 'path2URL'
    response = urllib2.urlopen(path1)
    print "Enter the name of the files you need"
    numZips = raw_input()
    numZips2 = numZips.split(",")
    myResponse(myCount, path1, response, numZips2)

def myResponse(myCount, path1, response, numZips2):
    myPath = os.getcwd()
    for each in response:
        eachNew = each.split("  ")
        eachCounty = eachNew[9].strip("\n").strip("\r")
        try:
            myCountyDir = os.mkdir(os.path.expanduser(myPath+ "\\counties" + "\\" + eachCounty))
        except:
            pass
        myRetrieveDir = myPath+"\\counties" + "\\" + eachCounty
        os.chdir(myRetrieveDir)
        myCount+=1
        response1 = urllib2.urlopen(path1 + eachNew[9])
        for all1 in response1:
            allNew = all1.split(",")
            allFinal = allNew[0].split(" ")
            allFinal1 = allFinal[len(allFinal)-1].strip(" ").strip("\n").strip("\r")
            numZipsIter = 0
            path8 = path1 + eachNew[9][0:len(eachNew[9])-2] +"/"+ allFinal1
            downZip = eachNew[9][0:len(eachNew[9])-2]+".zip"
            while(numZipsIter <len(numZips2)):
                if (numZips2[numZipsIter][0:3].strip(" ") == "NWI") and ("remap" not in allFinal1):
                    numZips2New = numZips2[numZipsIter].split("_")
                    if (numZips2New[0].strip(" ") in allFinal1 and numZips2New[1] != "remap" and numZips2New[2].strip(" ") in allFinal1) and (allFinal1[-3:]=="ZIP" or allFinal1[-3:]=="zip"):
                        urllib.urlretrieve (path8,  allFinal1)
                        zip1 = zipfile.ZipFile(myRetrieveDir +"\\" + allFinal1)
                        zip1.extractall(myRetrieveDir)
                #maybe just have numzips2 (raw input) as the values before the county number
                #numZips2[numZipsIter][0:-7].strip(" ") in allFinal1 or numZips2[numZipsIter][0:-7].strip(" ").lower() in allFinal1) and (allFinal1[-3:]=="ZIP" or allFinal1[-3:]=="zip"
                elif (numZips2[numZipsIter].strip(" ") in allFinal1 or numZips2[numZipsIter].strip(" ").lower() in allFinal1) and (allFinal1[-3:]=="ZIP" or allFinal1[-3:]=="zip"):
                    urllib.urlretrieve (path8,  allFinal1)
                    zip1 = zipfile.ZipFile(myRetrieveDir +"\\" + allFinal1)
                    zip1.extractall(myRetrieveDir)
                numZipsIter+=1



pickData()

#client picks shapefiles to add to map
#section for geoprocessing operations




# get the data frames



#add new data frame, title
#check spaces in ftp crawler



os.chdir(path)
env.workspace = path+ "\\symbology\\"
zp1 = os.listdir(path + "\\counties\\")

def myGeoprocessing(layer1, layer2):
    #the code in this function is used for geoprocessing operations
    #it returns whatever output is generated from the tools used in the map
    try:
        arcpy.Clip_analysis(path + "\\symbology\\Stream_order.shp", layer1, path + "\\counties\\" + layer2 + "\\Streams.shp")
    except:
        pass
    streams = arcpy.mapping.Layer(path + "\\counties\\" + layer2 + "\\Streams.shp")
    arcpy.ApplySymbologyFromLayer_management(streams, path+ '\\symbology\\streams.lyr')
    return streams

def makeMap():
    #original wetlands layers need to be entered as NWI_line or NWI_poly
    print "Enter the layer or layers you wish to include in the map"
    myInput = raw_input();
    counter1 = 1
    for each in zp1:
        print each
        print path
        zp2 = os.listdir(path + "\\counties\\" + each)
        for eachNew in zp2:
            #print eachNew
            if (eachNew[-4:] == ".shp") and ((myInput in eachNew[0:-7] or myInput.lower() in eachNew[0:-7])or((eachNew[8:12] == "poly" or eachNew[8:12]=='line') and eachNew[8:12] in myInput)):
                print eachNew[0:-7]
                theMap = arcpy.mapping.MapDocument(path +'\\map.mxd')
                df1 = arcpy.mapping.ListDataFrames(theMap,"*")[0]
                #this is where we add our layers
                layer1 = arcpy.mapping.Layer(path + "\\counties\\" + each + "\\" + eachNew)
                if(eachNew[7:11] == "poly" or eachNew[7:11] =="line"):
                    arcpy.ApplySymbologyFromLayer_management(layer1, path + '\\symbology\\' +myInput+'.lyr')
                else:
                    arcpy.ApplySymbologyFromLayer_management(layer1, path + '\\symbology\\' +eachNew[0:-7]+'.lyr')

                # Assign legend variable for map
                legend = arcpy.mapping.ListLayoutElements(theMap, "LEGEND_ELEMENT", "Legend")[0]
                # add wetland layer to map
                legend.autoAdd = True
                try:
                    arcpy.mapping.AddLayer(df1, layer1,"AUTO_ARRANGE")
                    #geoprocessing steps
                    streams = myGeoprocessing(layer1, each)
                    # more geoprocessing options, add the layers to map and assign if they should appear in legend
                    legend.autoAdd = True
                    arcpy.mapping.AddLayer(df1, streams,"TOP")

                    df1.extent = layer1.getExtent(True)

                    arcpy.mapping.ExportToJPEG(theMap, path + "\\counties\\" + each + "\\map.jpg")
                    # Save map document to path
                    theMap.saveACopy(path + "\\counties\\" + each + "\\map.mxd")
                    del theMap

                    print "done with map " + str(counter1)
                except:
                    print "issue with map or already exists"
                counter1+=1

makeMap() 

Original Design

import os, sys, urllib2, urllib, zipfile, arcpy
from arcpy import env

response = urllib2.urlopen('path2URL')
path1 = 'path2URL'
myCount = 1
for each in response:
    eachNew = each.split("  ")
    myCount+=1
    response1 = urllib2.urlopen(path1 + eachNew[9])
    for all1 in response1:
        #print all1
        allNew = all1.split(",")
        allFinal = allNew[0].split(" ")
        allFinal1 = allFinal[len(allFinal)-1].strip(" ")
        if allFinal1[-10:-2] == "poly.ZIP":
            response2 = urllib2.urlopen('path2URL')
            zipcontent= response2.readlines()
            path8 = 'path2URL'+ eachNew[9][0:len(eachNew[9])-2] +"/"+ allFinal1[0:len(allFinal1)-2]
            downZip = str(eachNew[9][0:len(eachNew[9])-2])+ ".zip"
            urllib.urlretrieve (path8,  downZip)






# Set the path to the directory where your zipped folders reside
zipfilepath = 'F:\Misc\presentation'
# Set the path to where you want the extracted data to reside
extractiondir = 'F:\Misc\presentation\counties'
# List all data in the main directory
zp1 = os.listdir(zipfilepath)
# Creates a loop which gives use each zipped folder automatically
# Concatinates zipped folder to original directory in variable done
for each in zp1:
    print each[-4:]
    if each[-4:] == ".zip":
        done = zipfilepath + "\\" + each
        zip1 = zipfile.ZipFile(done)
        extractiondir1 = extractiondir + "\\" + each[:-4]
        zip1.extractall(extractiondir1)



path = os.getcwd()
counter1 = 1

# get the data frames


# Create new layer for all files to be added to map document


env.workspace = "E:\\Misc\\presentation\\symbology\\"
zp1 = os.listdir(path + "\\counties\\")
for each in zp1:
    zp2 = os.listdir(path + "\\counties\\" + each)
    for eachNew in zp2:
        if eachNew[-4:] == ".shp":
            wetlandMap = arcpy.mapping.MapDocument('E:\\Misc\\presentation\\wetland.mxd')
            df1 = arcpy.mapping.ListDataFrames(wetlandMap,"*")[0]
            #print eachNew[-4:]
            wetland = arcpy.mapping.Layer(path + "\\counties\\" + each + "\\" + eachNew)
            #arcpy.Clip_analysis(path + "\\symbology\\Stream_order.shp", wetland, path + "\\counties\\" + each + "\\Streams.shp")
            streams = arcpy.mapping.Layer(path + "\\symbology\\Stream_order.shp")
            arcpy.ApplySymbologyFromLayer_management(wetland, path + '\\symbology\\wetland.lyr')
            arcpy.ApplySymbologyFromLayer_management(streams, path+ '\\symbology\\streams.lyr')
            # Assign legend variable for map
            legend = arcpy.mapping.ListLayoutElements(wetlandMap, "LEGEND_ELEMENT", "Legend")[0]
            # add the layers to map and assign if they should appear in legend
            legend.autoAdd = True
            arcpy.mapping.AddLayer(df1, streams,"TOP")
            legend.autoAdd = True
            arcpy.mapping.AddLayer(df1, wetland,"AUTO_ARRANGE")

            df1.extent = wetland.getExtent(True)
            # Export the map to a pdf
            arcpy.mapping.ExportToJPEG(wetlandMap, path + "\\counties\\" + each + "\\wetland.jpg")
            # Save map document to path
            wetlandMap.saveACopy(path + "\\counties\\" + each + "\\wetland.mxd")
            del wetlandMap

            print "done with map " + str(counter1)
            counter1+=1
Grigori Rasputin
  • 51
  • 1
  • 1
  • 4

1 Answers1

1

Have a look at this guide:

Let me quote:

Function call overhead in Python is relatively high, especially compared with the execution speed of a builtin function. This strongly suggests that where appropriate, functions should handle data aggregates.

So effectively this suggests, to not factor out something as a function that is going to be called hundreds of thousands of times.

In Python functions won't be inlined, and calling them is not cheap. If in doubt use a profiler to find out how many times is each function called, and how long does it take on average. Then optimize.

You might also give PyPy a shot, as they have certain optimizations built in. Reducing the function call overhead in some cases seems to be one of them:

Community
  • 1
  • 1
moooeeeep
  • 31,622
  • 22
  • 98
  • 187