1

While iterating through a nested for loop, I attempted to obtain a list of lists. Each list consisted of the data within each row of a dataframe. The reasoning behind this is not relevant; the behavior that occurred is what I am trying to understand. The initial code was as follows:

values = []
insert = []
for row in range(df_new_obs.shape[0]):
    print('row: ', row)
    print(insert)
    values.append(insert)
    insert = []

    for col in range(df_new_obs.shape[1]):
        insert.append(df_new_obs.iloc[row][col])

This code was able to construct the "list of lists" but failed to include the last row of the dataframe. The results of the initial code began with an empty array, followed by the dataframe rows from the beginning:

row: 0
[]
row: 1
['0', '0.0', '0', '1', '6', '179', '30', '19', '1', '0', '0', '0.11', '0']
row: 2
['1', '0.0', '0', '0', '6', '361', '28', '27', '0', '1', '4', '0.81', '1']
etc ...

I read a few posts to gain some insight on how iteration occurs in python and decided to alter the code to obtain the last row from the data frame. I was successful with the following code but cannot figure out why the last row of the dataframe showed up as the first element within the list. Can anyone offer an explanation/ insight on why this last row was so elusive and why it occurred as the first element?

values = []
insert = []
for row in range(df_new_obs.shape[0] + 1):
    print('row number:', row)
    print(insert)
    values.append(insert)
    insert = []
    for col in range(df_new_obs.shape[1]):
        insert.append(df_new_obs.iloc[row - 1][col])

Output:

row number: 0
[]
row number: 1
['0', '0.0', '0', '0', '7', '179', '53', '25', '0', '1', '1', '0.1', '0']
row number: 2
['0', '0.0', '0', '1', '6', '179', '30', '19', '1', '0', '0', '0.11', '0']
row number: 3
['1', '0.0', '0', '0', '6', '361', '28', '27', '0', '1', '4', '0.81', '1']
etc...
John Ketterer
  • 137
  • 1
  • 1
  • 9
  • 2
    You can do this, https://stackoverflow.com/a/28006809/4985099 in combination with enumerate to get row index. – sushanth Jun 29 '20 at 07:22

1 Answers1

0

The order of insertion is wrong - you add the last iterations insert to your values - then you build up the new insert.

After the last insert was filled it does never get added to your values:

values = []
insert = []
for row in range(df_new_obs.shape[0]):
    print('row: ', row)
    print(insert)
    values.append(insert)         # ADDS empty on 1st execution of loop
    insert = []

    for col in range(df_new_obs.shape[1]):
        insert.append(df_new_obs.iloc[row][col]) # add data, insert is added next loops iteration

You would need to add the latest built insert after the loop to get the last data as well:

# place after    for row in range(df_new_obs.shape[0]):    loop
values.append(insert)

Probably better to avoid the confusion by using

values = []
insert = []
for row in range(df_new_obs.shape[0]):
    print('row: ', row)
    insert = []

    for col in range(df_new_obs.shape[1]):
        insert.append(df_new_obs.iloc[row][col]) # add data to insert

    values.append(insert)  # adds insert to values 

... or to use better ways to "export" your data from your dataframe: Pandas DataFrame to List of Lists

Patrick Artner
  • 50,409
  • 9
  • 43
  • 69