1

so i am creating some data frame in a loop and save them as a csv file. The data frame have the same columns but different length. i would like to be able to concatenate these data frames into a single data frame that has all the columns something like

df1 A B C 0 0 1 2 1 0 1 0 2 1.2 1 1 3 2 1 2

df2 A B C 0 0 1 2 1 0 1 0 2 0.2 1 2

df3 A B C 0 0 1 2 1 0 1 0 2 1.2 1 1 3 2 1 4 4 1 2 2 5 2.3 3 0

i would like to get something like

df_big A B C A B C A B C 0 0 1 2 0 1 2 0 1 2 1 0 1 0 0 1 0 0 1 0 2 1.2 1 1 0.2 1 2 1.2 1 1 3 2 1 2 2 1 4 4 1 2 2 5 2.3 3 0 is this something that can be done in pandas?

  • Er.. have you looked at [`concat`](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.tools.merge.concat.html)? – EdChum Aug 27 '14 at 19:37

1 Answers1

1

You could use pd.concat:

df_big = pd.concat([df1, df2, df3], axis=1)

yields

     A   B   C    A   B   C    A  B  C
0  0.0   1   2  0.0   1   2  0.0  1  2
1  0.0   1   0  0.0   1   0  0.0  1  0
2  1.2   1   1  0.2   1   2  1.2  1  1
3  2.0   1   2  NaN NaN NaN  2.0  1  4
4  NaN NaN NaN  NaN NaN NaN  1.0  2  2
5  NaN NaN NaN  NaN NaN NaN  2.3  3  0
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • it is in a loop so technically i dont have df1, df2...but i can create them and save them before hands. or it there a way to concatenant on fly? thank you that was fast. – user3225439 Aug 27 '14 at 19:41
  • Save all the DataFrames in a list and then call `pd.concat(list_of_dfs)` once to create `df_big`. – unutbu Aug 27 '14 at 19:42
  • Concatenating inside the loop is inefficient because each call to `pd.concat` requires allocation of a new chunk of memory and copying the data from the pieces into the resultant DataFrame. It is the same reason why [string concatenation should be done with `str.join` rather than using `s += ...` in a loop.](http://stackoverflow.com/questions/1349311/python-string-join-is-faster-than-but-whats-wrong-here/1350289#1350289). – unutbu Aug 27 '14 at 19:46