iterrows producing extra unwanted output from dataframe

Question

In the code below I'm trying to get the 'proid' value and 'uim' value for each row of a dataframe. I'm trying to parse the first and second values from the 'proid' value and use them to create a new directory for each record. So for example for the first record it would create the directory '/stuff/_place/1/2' for the second record it would be '/stuff/_place/2/2'. The problem I'm running in to is that it's just creating directories 1 through 9, that's '/stuff/_place/1' to ''/stuff/_place/9', even though many of those numbers aren't present in the records in the dataframe. Does anyone see what the issue is and how I can accomplish my original goal?

The code worked correctly when I tested it for just the first record in the dataframe using .iloc[0] like the commented out code below. It started producing the extra directories when I tried using iterrows like the example below.

How to iterate over rows in a DataFrame in Pandas?

Code:

# iterows through whole data frame
sampleDf=testDf

for index, row in sampleDf.iterrows():


    pid=row['proid'] #sampleDf['proid'].iloc[0]

    ImgUrl=row['uim'] #sampleDf['uim'].iloc[0]


    # file path where images stored
    basePath=‘/stuff/_place/‘

    # 1st digit
    dig1=str(pid)[0]

    # 2nd digit
    dig2=str(pid)[1]

    # checking if directory exists and making new directory if it doesn't
    directory=basePath+dig1+'/'+dig2

    if not os.path.exists(directory):
        os.makedirs(directory)


Data:

proid   uim
123 red
224 veg
456 fog

What is the logic to split a 3 digit number to dig1 and dig2 ? — neo, Mar 27 '18 at 04:43

score 0 · Answer 1 · answered Mar 27 '18 at 11:46

What is the problem? I had to edit the code so that it runs, and it works without a problem. Next time, write the code so that it is possible to copy and past it and then run it without needing to change anything.

The following code, adapted from yours

import os
import numpy as np
import pandas as pd

# iterows through whole data frame
sampleDf= pd.DataFrame([[123, 'red'], [224, 'veg'], [456, 'fog']],columns=['proid', 'uim'])

for index, row in sampleDf.iterrows():


    pid=row['proid'] #sampleDf['proid'].iloc[0]

    ImgUrl=row['uim'] #sampleDf['uim'].iloc[0]


    # file path where images stored
    basePath="/stuff/_place/"

    # 1st digit
    dig1=str(pid)[0]

    # 2nd digit
    dig2=str(pid)[1]

    # checking if directory exists and making new directory if it doesn't
    directory=basePath+dig1+'/'+dig2

    if not os.path.exists(directory):
        os.makedirs(directory)

Creates the following directories:

Thanks, yeah it seems to have been a hick up with my notebook. I think the test dataframe I was using got confused with an earlier dataframe and so was using the whole index from the earlier dataframe instead of the just a few records. — user3476463, Mar 27 '18 at 14:59

iterrows producing extra unwanted output from dataframe

1 Answers1