0

I am attempting to select a row of data, merge it with another dataframe, then select the next row, performing the same operation. I believe I should be able to do this with pandas.iterrows.

import pandas as pd
import numpy as np

one = {'A' : pd.Series([3, 4, 2]), 'B' : pd.Series(['one', 'two', 'one'])}
two = {'A' : pd.Series([1, 2, 3, 4, 5, 6])}

df1 = pd.DataFrame(one, columns = ['A', 'B'])
df2 = pd.DataFrame(two)

for index, row in df1.iterrows():
    row1 = pd.DataFrame(row)
    print(row1)

    df3 = pd.merge(df2, row1, on = 'A', how = 'left')
    print (df3)

When I print(row1), I get:

     0
A    3
B  one
     1
A    4
B  two
     2
A    2
B  one

The join fails due to a key error, which makes sense to me given the structure of print(row1).

The desired outcome of df3 is:

    A   B
0   1  Nan
1   2  Nan
2   3  one
3   4  Nan
4   5  Nan
5   6  Nan

    A   B
0   1  Nan
1   2  Nan
2   3  Nan
3   4  two
4   5  Nan
5   6  Nan

It appears to me that the column labels are now the index. I think I need to reset the index, so that 'A' and 'B' will be values that I can join on. Is there an efficient way to accomplish this?

kharn
  • 83
  • 6
  • ... I'll edit it out, but next time, please avoid unnecessary empty lines in source code. Damn near illegible. – Marcus Müller Feb 07 '16 at 14:32
  • jezrael, I want to select one row from the first dataframe, and merge it with the second dataframe. Then select the second row and merge it with the second dataframe. I know that I could merge the two dfs without iterrows, but the result would be to have all three rows from the first df merged on one output, which is not what I am trying to accomplish. – kharn Feb 07 '16 at 14:37
  • Possible duplicate of [cartesian product in pandas](http://stackoverflow.com/questions/13269890/cartesian-product-in-pandas) – Aprillion Feb 07 '16 at 14:45
  • 1
    You can try change `row1 = pd.DataFrame(row)` to `row1 = pd.DataFrame(row).T`. – jezrael Feb 07 '16 at 14:57
  • thanks @jezrael, that worked perfectly. – kharn Feb 07 '16 at 15:02

1 Answers1

1

You can try add T:

row1 = pd.DataFrame(row)

to

row1 = pd.DataFrame(row).T
   A    B
0  1  NaN
1  2  NaN
2  3  one
3  4  NaN
4  5  NaN
5  6  NaN
   A    B
0  1  NaN
1  2  NaN
2  3  NaN
3  4  two
4  5  NaN
5  6  NaN
   A    B
0  1  NaN
1  2  one
2  3  NaN
3  4  NaN
4  5  NaN
5  6  NaN    
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252