In the code below I'm trying to get the 'proid' value and 'uim' value for each row of a dataframe. I'm trying to parse the first and second values from the 'proid' value and use them to create a new directory for each record. So for example for the first record it would create the directory '/stuff/_place/1/2' for the second record it would be '/stuff/_place/2/2'. The problem I'm running in to is that it's just creating directories 1 through 9, that's '/stuff/_place/1' to ''/stuff/_place/9', even though many of those numbers aren't present in the records in the dataframe. Does anyone see what the issue is and how I can accomplish my original goal?
The code worked correctly when I tested it for just the first record in the dataframe using .iloc[0] like the commented out code below. It started producing the extra directories when I tried using iterrows like the example below.
How to iterate over rows in a DataFrame in Pandas?
Code:
# iterows through whole data frame
sampleDf=testDf
for index, row in sampleDf.iterrows():
pid=row['proid'] #sampleDf['proid'].iloc[0]
ImgUrl=row['uim'] #sampleDf['uim'].iloc[0]
# file path where images stored
basePath=‘/stuff/_place/‘
# 1st digit
dig1=str(pid)[0]
# 2nd digit
dig2=str(pid)[1]
# checking if directory exists and making new directory if it doesn't
directory=basePath+dig1+'/'+dig2
if not os.path.exists(directory):
os.makedirs(directory)
Data:
proid uim
123 red
224 veg
456 fog