3

I want to read every nth row of a list of DataFrames and create a new DataFrames by appending all the Nth rows.

Let's say we have the following DataFrames:

>>> df1
   A    B    C    D
0 -0.8 -2.8 -0.3 -0.1
1 -0.1 -0.9  0.2 -0.7
2  0.7 -3.3 -1.1 -0.4 

>>> df2
   A    B    C    D
0  1.4 -0.7  1.5 -1.3
1  1.6  1.4  1.4  0.2
2 -1.4  0.2 -1.7  0.7 

>>> df3
   A    B    C    D
0  0.3 -0.5 -1.6 -0.8
1  0.2 -0.5 -1.1  1.6
2 -0.3  0.7 -1.0  1.0

I have used the following approach to get the desired df:

df = pd.DataFrame()

df_list = [df1, df2, df3]

for i in range(len(df1)):
    for x in df_list:
        df = df.append(x.loc[i], ignore_index = True)

Here's the result:

>>> df
   A    B    C    D
0 -0.8 -2.8 -0.3 -0.1
1  1.4 -0.7  1.5 -1.3
2  0.3 -0.5 -1.6 -0.8
3 -0.1 -0.9  0.2 -0.7
4  1.6  1.4  1.4  0.2
5  0.2 -0.5 -1.1  1.6
6  0.7 -3.3 -1.1 -0.4
7 -1.4  0.2 -1.7  0.7
8 -0.3  0.7 -1.0  1.0 

I was just wondering if there is a pandas way of rewriting this code which would do the same thing (maybe by using .iterrows, pd.concat, pd.join, or pd.merge)?

Cheers

Update Simply appending one df after another is not what I am looking for here.

The code should do:

df.row1 = df1.row1
df.row2 = df2.row1
df.row3 = df3.row1
df.row4 = df1.row2
df.row5 = df2.row2
df.row6 = df3.row2
...
asimo
  • 2,340
  • 11
  • 29
Manny
  • 679
  • 5
  • 12
  • 1
    Possible [duplicate](https://stackoverflow.com/questions/36526282/append-multiple-pandas-data-frames-at-once) ,, there is already a answer there. – Karn Kumar Oct 02 '18 at 14:27
  • 1
    Note **the order of the rows** in the result. – Manny Oct 02 '18 at 23:13
  • 1
    editing the code after getting answer should be avoided as by the time there are already answers provided and time spend on that is vain , if ou have edition open up a new question! i see the question asked 13 hours ago and it has new edition to this on the top. – Karn Kumar Oct 03 '18 at 03:44
  • 1
    @pygo The code didn't change at all. There are lots of irrelevant answers, so I thought I need to ask everybody else to read the question carefully and then add replies. Nothing has changed in the title nor in the code. Cheers – Manny Oct 03 '18 at 04:32

4 Answers4

5

For a single output dataframe, you can concatenate and sort by index:

res = pd.concat([df1, df2, df3]).sort_index().reset_index(drop=True)

     A    B    C    D
0 -0.8 -2.8 -0.3 -0.1
1  1.4 -0.7  1.5 -1.3
2  0.3 -0.5 -1.6 -0.8
3 -0.1 -0.9  0.2 -0.7
4  1.6  1.4  1.4  0.2
5  0.2 -0.5 -1.1  1.6
6  0.7 -3.3 -1.1 -0.4
7 -1.4  0.2 -1.7  0.7
8 -0.3  0.7 -1.0  1.0

For a dictionary of dataframes, You can concatenate and then group by index:

res = dict(tuple(pd.concat([df1, df2, df3]).groupby(level=0)))

With the dictionary defined as above, each value represents a row number. For example, res[0] will give the first row from each input dataframe.

jpp
  • 159,742
  • 34
  • 281
  • 339
3

There is pd.concat

df=pd.concat([df1,df2,df3]).reset_index(drop=True)

recommended by Jez

df=pd.concat([df1,df2,df3],ignore_index=True)
BENY
  • 317,841
  • 20
  • 164
  • 234
2

try :

>>> df1 = pd.DataFrame({'A':['-0.8', '-0.1', '0.7'],
...                     'B':['-2.8', '-0.9', '-3.3'],
...                      'C':['-0.3', '0.2', '-1.1'],
...                      'D':['-0.1', '-0.7', '-0.4']})
>>>
>>> df2 = pd.DataFrame({'A':['1.4', '1.6', '-1.4'],
...                     'B':['-0.7', '1.4', '0.2'],
...                      'C':['1.5', '1.4', '-1.7'],
...                      'D':['-1.3', '0.2', '0.7']})
>>>

>>> df3 = pd.DataFrame({'A':['0.3', '0.2', '-0.3'],
...                     'B':['-0.5', '-0.5', '0.7'],
...                      'C':['-1.6', '-1.1', '-1.0'],
...                      'D':['-0.8', '1.6', '1.0']})

>>> df=pd.concat([df1,df2,df3],ignore_index=True)
>>> print(df)
      A     B     C     D
0  -0.8  -2.8  -0.3  -0.1
1  -0.1  -0.9   0.2  -0.7
2   0.7  -3.3  -1.1  -0.4
3   1.4  -0.7   1.5  -1.3
4   1.6   1.4   1.4   0.2
5  -1.4   0.2  -1.7   0.7
6   0.3  -0.5  -1.6  -0.8
7   0.2  -0.5  -1.1   1.6
8  -0.3   0.7  -1.0   1.0

OR

df=pd.concat([df1,df2,df3], axis=0, join='outer', ignore_index=True)

Note:

axis: whether we will concatenate along rows (0) or columns (1)
join: can be set to inner, outer, left, or right. by using outer its sort it's lexicographically
ignore_index: whether or not the original row labels from should be retained, by default False ,If True, do not use the index labels.
Karn Kumar
  • 8,518
  • 3
  • 27
  • 53
  • 1
    what is the different between this and mine – BENY Oct 02 '18 at 14:21
  • @Wen, there is no difference i see its just i was reproducing the OPS scenario to test and answer , so there should be no offence? is there any issues. – Karn Kumar Oct 02 '18 at 14:23
  • 1
    This is not what is expected from the code. Your approaches will not produce **the expected result**. – Manny Oct 02 '18 at 23:15
  • @Manny, pic the answer which suits you better! or by now you migh have got the hint to get the **expected result** as there are enough answers are provide best on the understanding. – Karn Kumar Oct 03 '18 at 03:39
0

You can concatenate them keeping their original indexes as a column this way:

df_total = pd.concat([df1.reset_index(), df2.reset_index(),
                      df3.reset_index()]) 

>> df_total
   index    A    B    C    D
0      0 -0.8 -2.8 -0.3 -0.1
1      1 -0.1 -0.9  0.2 -0.7
2      2  0.7 -3.3 -1.1 -0.4
0      0  1.4 -0.7  1.5 -1.3
1      1  1.6  1.4  1.4  0.2
2      2 -1.4  0.2 -1.7  0.7
0      0  0.3 -0.5 -1.6 -0.8
1      1  0.2 -0.5 -1.1  1.6
2      2 -0.3  0.7 -1.0  1.0

Then you can make a multiindex dataframe and order by index:

df_joined = df_total.reset_index(drop=True).reset_index()

>> df_joined
   level_0  index    A    B    C    D
0        0      0 -0.8 -2.8 -0.3 -0.1
1        1      1 -0.1 -0.9  0.2 -0.7
2        2      2  0.7 -3.3 -1.1 -0.4
3        3      0  1.4 -0.7  1.5 -1.3
4        4      1  1.6  1.4  1.4  0.2
5        5      2 -1.4  0.2 -1.7  0.7
6        6      0  0.3 -0.5 -1.6 -0.8
7        7      1  0.2 -0.5 -1.1  1.6
8        8      2 -0.3  0.7 -1.0  1.0

>> df_joined = df_joined.set_index(['index', 'level_0']).sort_index()

>> df_joined

                 A    B    C    D
index level_0                    
0     0       -0.8 -2.8 -0.3 -0.1
      3        1.4 -0.7  1.5 -1.3
      6        0.3 -0.5 -1.6 -0.8
1     1       -0.1 -0.9  0.2 -0.7
      4        1.6  1.4  1.4  0.2
      7        0.2 -0.5 -1.1  1.6
2     2        0.7 -3.3 -1.1 -0.4
      5       -1.4  0.2 -1.7  0.7
      8       -0.3  0.7 -1.0  1.0

You can put all this a dataframe just by doing:

>>  pd.DataFrame(df_joined.values, columns = df_joined.columns)

     A    B    C    D
0 -0.8 -2.8 -0.3 -0.1
1  1.4 -0.7  1.5 -1.3
2  0.3 -0.5 -1.6 -0.8
3 -0.1 -0.9  0.2 -0.7
4  1.6  1.4  1.4  0.2
5  0.2 -0.5 -1.1  1.6
6  0.7 -3.3 -1.1 -0.4
7 -1.4  0.2 -1.7  0.7
8 -0.3  0.7 -1.0  1.0
Mabel Villalba
  • 2,538
  • 8
  • 19