0

I am looking at the below code example:

data = {'Name': ['A', 'B', 'C', 'D'], 
        'Age': [1, 2, 3, 4], 
        } 
# This is dummy data equivalent to what is read by csv

df = pd.DataFrame(data, columns = ['Name', 'Age']) 

for i in range(len(df)) : 
  print(df.loc[i, "Name"], df.loc[i, "Age"]) 

Am I right to say that this is an incorrect usage of loc? Because it expects that the index label and position is the same (and in this example will work correctly).

Is it a better practice to use iloc when looping using range. Where as when looping using df.index, use loc?

variable
  • 8,262
  • 9
  • 95
  • 215

1 Answers1

0

In your case, loc and iloc are working the same way. However, this may not always be true. loc references the index by label, and iloc references the index by position. Since you didn't specify an index when creating the dataframe, the index labels match the positions. It's possible that the index label does not match the position, see example:

data = {'Name': ['A', 'B', 'C', 'D'], 
        'Age': [1, 2, 3, 4], 
        } 
# This is dummy data equivalent to what is read by csv

df = pd.DataFrame(data, columns = ['Name', 'Age'])

# Try a different index
df.index = [1, 3, 2, 0]

print("Print by index name")
for i in range(len(df)) : 
    print(df.loc[i, "Name"], df.loc[i, "Age"]) 

print("Print by index position")
for i in range(len(df)) : 
    print(df['Name'].iloc[i], df["Age"].iloc[i])

If you need to iterate over the rows in order, it may be more helpful to use df.iterrows:

for i, row in df.iterrows():
    print(row['Name'], row['Age'])
SNygard
  • 916
  • 1
  • 9
  • 21