Concat/join/merge multiple dataframes based on row index (number) of each individual dataframes

Question

I want to read every nth row of a list of DataFrames and create a new DataFrames by appending all the Nth rows.

Let's say we have the following DataFrames:

>>> df1
   A    B    C    D
0 -0.8 -2.8 -0.3 -0.1
1 -0.1 -0.9  0.2 -0.7
2  0.7 -3.3 -1.1 -0.4 

>>> df2
   A    B    C    D
0  1.4 -0.7  1.5 -1.3
1  1.6  1.4  1.4  0.2
2 -1.4  0.2 -1.7  0.7 

>>> df3
   A    B    C    D
0  0.3 -0.5 -1.6 -0.8
1  0.2 -0.5 -1.1  1.6
2 -0.3  0.7 -1.0  1.0

I have used the following approach to get the desired df:

df = pd.DataFrame()

df_list = [df1, df2, df3]

for i in range(len(df1)):
    for x in df_list:
        df = df.append(x.loc[i], ignore_index = True)

Here's the result:

>>> df
   A    B    C    D
0 -0.8 -2.8 -0.3 -0.1
1  1.4 -0.7  1.5 -1.3
2  0.3 -0.5 -1.6 -0.8
3 -0.1 -0.9  0.2 -0.7
4  1.6  1.4  1.4  0.2
5  0.2 -0.5 -1.1  1.6
6  0.7 -3.3 -1.1 -0.4
7 -1.4  0.2 -1.7  0.7
8 -0.3  0.7 -1.0  1.0

I was just wondering if there is a pandas way of rewriting this code which would do the same thing (maybe by using .iterrows, pd.concat, pd.join, or pd.merge)?

Cheers

Update Simply appending one df after another is not what I am looking for here.

The code should do:

df.row1 = df1.row1
df.row2 = df2.row1
df.row3 = df3.row1
df.row4 = df1.row2
df.row5 = df2.row2
df.row6 = df3.row2
...

Possible [duplicate](https://stackoverflow.com/questions/36526282/append-multiple-pandas-data-frames-at-once) ,, there is already a answer there. — Karn Kumar, Oct 02 '18 at 14:27
editing the code after getting answer should be avoided as by the time there are already answers provided and time spend on that is vain , if ou have edition open up a new question! i see the question asked 13 hours ago and it has new edition to this on the top. — Karn Kumar, Oct 03 '18 at 03:44
@pygo The code didn't change at all. There are lots of irrelevant answers, so I thought I need to ask everybody else to read the question carefully and then add replies. Nothing has changed in the title nor in the code. Cheers — Manny, Oct 03 '18 at 04:32

jpp · Accepted Answer · 2018-10-02T23:50:14.997

For a single output dataframe, you can concatenate and sort by index:

res = pd.concat([df1, df2, df3]).sort_index().reset_index(drop=True)

     A    B    C    D
0 -0.8 -2.8 -0.3 -0.1
1  1.4 -0.7  1.5 -1.3
2  0.3 -0.5 -1.6 -0.8
3 -0.1 -0.9  0.2 -0.7
4  1.6  1.4  1.4  0.2
5  0.2 -0.5 -1.1  1.6
6  0.7 -3.3 -1.1 -0.4
7 -1.4  0.2 -1.7  0.7
8 -0.3  0.7 -1.0  1.0

For a dictionary of dataframes, You can concatenate and then group by index:

res = dict(tuple(pd.concat([df1, df2, df3]).groupby(level=0)))

With the dictionary defined as above, each value represents a row number. For example, res[0] will give the first row from each input dataframe.

Brilliant! This is exactly what I was looking for. – Manny Oct 03 '18 at 23:17 — Manny, Oct 03 '18 at 23:17

BENY · Answer 2 · 2018-10-02T14:11:53.593

3

There is pd.concat

df=pd.concat([df1,df2,df3]).reset_index(drop=True)

recommended by Jez

df=pd.concat([df1,df2,df3],ignore_index=True)

edited Oct 02 '18 at 14:11

answered Oct 02 '18 at 14:08

BENY

317,841
20
164
234

3

ignore_index=True ? – jezrael Oct 02 '18 at 14:09
2

The question here is not to append one DataFrame after another, but to take all the first rows first, then the second rows, then the thirds, and so on... – Manny Oct 02 '18 at 23:13

Karn Kumar · Answer 3 · 2018-10-02T14:46:17.140

try :

>>> df1 = pd.DataFrame({'A':['-0.8', '-0.1', '0.7'],
...                     'B':['-2.8', '-0.9', '-3.3'],
...                      'C':['-0.3', '0.2', '-1.1'],
...                      'D':['-0.1', '-0.7', '-0.4']})
>>>
>>> df2 = pd.DataFrame({'A':['1.4', '1.6', '-1.4'],
...                     'B':['-0.7', '1.4', '0.2'],
...                      'C':['1.5', '1.4', '-1.7'],
...                      'D':['-1.3', '0.2', '0.7']})
>>>

>>> df3 = pd.DataFrame({'A':['0.3', '0.2', '-0.3'],
...                     'B':['-0.5', '-0.5', '0.7'],
...                      'C':['-1.6', '-1.1', '-1.0'],
...                      'D':['-0.8', '1.6', '1.0']})

>>> df=pd.concat([df1,df2,df3],ignore_index=True)
>>> print(df)
      A     B     C     D
0  -0.8  -2.8  -0.3  -0.1
1  -0.1  -0.9   0.2  -0.7
2   0.7  -3.3  -1.1  -0.4
3   1.4  -0.7   1.5  -1.3
4   1.6   1.4   1.4   0.2
5  -1.4   0.2  -1.7   0.7
6   0.3  -0.5  -1.6  -0.8
7   0.2  -0.5  -1.1   1.6
8  -0.3   0.7  -1.0   1.0

OR

df=pd.concat([df1,df2,df3], axis=0, join='outer', ignore_index=True)

Note:

axis: whether we will concatenate along rows (0) or columns (1)
join: can be set to inner, outer, left, or right. by using outer its sort it's lexicographically
ignore_index: whether or not the original row labels from should be retained, by default False ,If True, do not use the index labels.

@Wen, there is no difference i see its just i was reproducing the OPS scenario to test and answer , so there should be no offence? is there any issues. — Karn Kumar, Oct 02 '18 at 14:23
This is not what is expected from the code. Your approaches will not produce **the expected result**. — Manny, Oct 02 '18 at 23:15
@Manny, pic the answer which suits you better! or by now you migh have got the hint to get the **expected result** as there are enough answers are provide best on the understanding. — Karn Kumar, Oct 03 '18 at 03:39

score 0 · Answer 4 · answered Oct 02 '18 at 14:13

You can concatenate them keeping their original indexes as a column this way:

df_total = pd.concat([df1.reset_index(), df2.reset_index(),
                      df3.reset_index()]) 

>> df_total
   index    A    B    C    D
0      0 -0.8 -2.8 -0.3 -0.1
1      1 -0.1 -0.9  0.2 -0.7
2      2  0.7 -3.3 -1.1 -0.4
0      0  1.4 -0.7  1.5 -1.3
1      1  1.6  1.4  1.4  0.2
2      2 -1.4  0.2 -1.7  0.7
0      0  0.3 -0.5 -1.6 -0.8
1      1  0.2 -0.5 -1.1  1.6
2      2 -0.3  0.7 -1.0  1.0

Then you can make a multiindex dataframe and order by index:

df_joined = df_total.reset_index(drop=True).reset_index()

>> df_joined
   level_0  index    A    B    C    D
0        0      0 -0.8 -2.8 -0.3 -0.1
1        1      1 -0.1 -0.9  0.2 -0.7
2        2      2  0.7 -3.3 -1.1 -0.4
3        3      0  1.4 -0.7  1.5 -1.3
4        4      1  1.6  1.4  1.4  0.2
5        5      2 -1.4  0.2 -1.7  0.7
6        6      0  0.3 -0.5 -1.6 -0.8
7        7      1  0.2 -0.5 -1.1  1.6
8        8      2 -0.3  0.7 -1.0  1.0

>> df_joined = df_joined.set_index(['index', 'level_0']).sort_index()

>> df_joined

                 A    B    C    D
index level_0                    
0     0       -0.8 -2.8 -0.3 -0.1
      3        1.4 -0.7  1.5 -1.3
      6        0.3 -0.5 -1.6 -0.8
1     1       -0.1 -0.9  0.2 -0.7
      4        1.6  1.4  1.4  0.2
      7        0.2 -0.5 -1.1  1.6
2     2        0.7 -3.3 -1.1 -0.4
      5       -1.4  0.2 -1.7  0.7
      8       -0.3  0.7 -1.0  1.0

You can put all this a dataframe just by doing:

>>  pd.DataFrame(df_joined.values, columns = df_joined.columns)

     A    B    C    D
0 -0.8 -2.8 -0.3 -0.1
1  1.4 -0.7  1.5 -1.3
2  0.3 -0.5 -1.6 -0.8
3 -0.1 -0.9  0.2 -0.7
4  1.6  1.4  1.4  0.2
5  0.2 -0.5 -1.1  1.6
6  0.7 -3.3 -1.1 -0.4
7 -1.4  0.2 -1.7  0.7
8 -0.3  0.7 -1.0  1.0

Concat/join/merge multiple dataframes based on row index (number) of each individual dataframes

4 Answers4