0

I trying to build a data frame based on another one. In order to build the second one, I need to loop over the first data frame and make some changes to the data and insert it in the second one. I am using a namedTuple for my for loop.

This loop is taking a lot of time to process 2m rows of data. Is there any fastest way to do this?

  • Does this answer your question? [How to iterate over rows in a DataFrame in Pandas?](https://stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandas) – Mayank Porwal Apr 30 '20 at 09:10
  • Have you had a look at iterrows function? https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iterrows.html#pandas-dataframe-iterrows – WK123 Apr 30 '20 at 09:10
  • Can you show some example code so we know how you need to work with your first dataframe? – kynnemall Apr 30 '20 at 09:10

2 Answers2

1

Since usually pandas dataframe were built on columns, it seems that it cannot provide a way to iterate through lines. However, This is the way I use for processing each row from the pandas dataframe:

rows = zip(*(table.loc[:, each] for each in table))
for rowNum, record in enumerate(rows):
    # If you want to process record, modify the code to process here:
    # Otherwise can just print each row
    print("Row", rowNum, "records: ", record)

Btw, I still suggest you to look for some pandas methods that can help you process your first dataframe - usually will be quicker and more effective than you write your own. Wish this could help.

Sam Y
  • 52
  • 5
0

I'd recommend using the iterrows function that is built into pandas.

data = {'Name': ['John', 'Paul', 'George'], 'Age': [20, 21, 19]}
  db = pd.DataFrame(data)
  print(f"Dataframe:\n{db}\n")
    for row, col in db.iterrows():
      print(f"Row Index:{row}")
      print(f"Column:\n{col}\n")

The output of the above:

Dataframe:
     Name  Age
0    John   20
1    Paul   21
2  George   19

Row Index:0
Column:
Name    John
Age       20
Name: 0, dtype: object

Row Index:1
Column:
Name    Paul
Age       21
Name: 1, dtype: object

Row Index:2
Column:
Name    George
Age         19
Name: 2, dtype: object

WK123
  • 620
  • 7
  • 18