2

I have a pandas data frame gmat. The sample data looks like

YEAR  student score mail_id      phone            Loc
2012  abc     630   abc@xyz.com  1-800-000-000   pqr
2012  pqr     630   pqr@xyz.com  1-800-000-000   abc

I would like to iterate through this data frame & create a dataframe from rows of this data frame in for loop & use that data frame for doing calculation.Each iteration in for loop will overwrite previous dataframe with the current row in iteration. For example my first data frame in for loop will look like

YEAR  student score mail_id      phone            Loc
2012  abc     630   abc@xyz.com  1-800-000-000   pqr

and second dataframe after overwriting first row will look like

YEAR  student score mail_id      phone            Loc
2012  pqr     630   pqr@xyz.com  1-800-000-000   abc

So I tried following code

for row in gmat.iterrows():
    
    df=pd.DataFrame(list(row))

But while checking I'm seeing df is not populated properly. It's only showing 2 columns Can you please suggest me how to do it?

I also tried this based on Georgy's suggestion, I used for index, row in gmat.iterrows(). Here I'm getting row as a pd.Series then I'm using gmrow=pd.DataFrame(row) But my column heading of original data is coming as row. Data I'm getting as YEAR 2012 student abc score 630 mail_id abc@xyz.com phone 1-800-000-000 Loc pqr

Christina Hughes
  • 357
  • 3
  • 5
  • 11
  • Possible duplicate of [How to iterate over rows in a DataFrame in Pandas?](https://stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandas) – Georgy Feb 01 '18 at 15:22
  • See the accepted answer above. It should be `for index, row in gmat.iterrows()`. In your case your `row` is a tuple of an integer index and a `pd.Series`. This is why you get those '2 columns'. Also, when you fix this, you won't need to convert `row` to `list`. – Georgy Feb 01 '18 at 15:26
  • @Georgy,Please refer my original post. I' tried your suggestion but output format what I'm getting is different than what I want – Christina Hughes Feb 02 '18 at 06:29
  • `gmrow=pd.DataFrame(row).T` will transpose it to the format you want – Georgy Feb 02 '18 at 10:17

1 Answers1

5

You can slice your dataframe like this:

for index, row in gmat.iterrows(): x = df[index:index+1] print("print iterations:",x)

print is just an example. You can do your desired transformations with x

  • 1
    There is no need to use `iterrows` if you never use the `row`. Simple `for index in range(...)` would be enough – Georgy Feb 02 '18 at 10:19
  • Please could you elaborate use of range to iterate over a dataframe? As far as I know, iterrows returns dataframe with its original schema. Range function will take int input and will possibly throw int object is not iterable error when used on a dataframe. – Shrinivas Deshmukh Feb 03 '18 at 02:20
  • 1
    I don't really understand what is not clear. Yes, `iterrows` yields rows of a dataframe along with corresponding indices, but use of it is justified if you actually use those rows. In your case you are using only the indices. Either use `row` or don't use `iterrows` at all. So, for example, either `for _, row in df.iterrows(): print(pd.DataFrame(row).T)` or `for i in range(df.shape[0]): print(df[i:i+1])` – Georgy Feb 03 '18 at 18:34
  • Sure! Thanks for the explaination sir! – Shrinivas Deshmukh Feb 04 '18 at 02:49