10

I have two dataframes:

df1 = 
    value
0     a
1     b
2     c

df2 =
    value
0     d
1     e

I need to concatenate them across index, but I have to preserve the index of the first dataframe and continue it in the second dataframe, like this:

result =
    value
0     a
1     b
2     c
3     d
4     e

My guess is that pd.concat([df1, df2], ignore_index=True) will do the job. However, I'm worried that for large dataframes the order of the rows may be changed and I'll end up with something like this (first two rows changed indices):

result =
    value
0     b
1     a
2     c
3     d
4     e

So my question is, does the pd.concat with ignore_index=True save the index succession within dataframes that are being concatenated, or there is randomness in the index assignment?

cs95
  • 379,657
  • 97
  • 704
  • 746
Alexandr Kapshuk
  • 1,380
  • 2
  • 13
  • 29
  • 1
    I don't think you will need to worry about the order, `concat` is basically stacking the dataframes that you specified in their given order, `ignore_index` will reset the index to ascending numbers after the concatenation. I have been using it all the time and never had issue with the orders. – TYZ Jun 11 '19 at 14:47
  • I just want to be sure. Cause I had a similar problem when passing data from a DataFrame to a ndarray, usually the order is preserved, but in my particular case it was not. – Alexandr Kapshuk Jun 11 '19 at 14:54
  • *I just want to be sure* -> write a test – Paul H Jun 11 '19 at 15:26
  • @PaulH what I'm afraid of is that the usual behavior is to keep the index just because that's how the calculations in the memory are done, but in some special cases it may not be true (for example, python spills the data on the hard drive to free memory, and then reads them again) – Alexandr Kapshuk Jun 11 '19 at 15:31

1 Answers1

9

In my experience, pd.concat concats the rows in the order the DataFrames are passed to it during concatenation.


If you want to be safe, specify sort=False which will also avoid sorting on columns:

pd.concat([df1, df2], axis=0, sort=False, ignore_index=True)

  value
0     a
1     b
2     c
3     d
4     e
cs95
  • 379,657
  • 97
  • 704
  • 746
  • So "not sorted" means that the original order will be preserved? Because it also may mean a random order (which usually turns out to be the original order just because of the way the calculations are performed). – Alexandr Kapshuk Jun 11 '19 at 14:52
  • @AlexandrKapshuk no, it means the original order is preserved. pd.concat will sort the columns if they are not ordered when concatenating vertically. But the rows will maintain their order. – cs95 Jun 11 '19 at 14:54
  • Isn't the non-concatenation axis in the example the columns themselves, as we are concatenating along the index? – araraonline Jun 11 '19 at 15:07
  • @araraonline `axis=0` means the DataFrames are concatenated _vertically_, meaning we have to align on the columns. For `axis=1`, we concatenate them horizontally, so they are aligned on the index. In my understanding, concatenation axis is the axis along which the DataFrames must be aligned. – cs95 Jun 11 '19 at 15:08
  • That is a bit confusing... Probably because of the choice of terms used by pandas. I will carry on a little test here just to check and then edit this comment :) – araraonline Jun 11 '19 at 15:13
  • 1
    It's the columns that are sorted: [notebook](https://colab.research.google.com/drive/1COyJg6ZT_qaKmQYpwDu8cClYy28aed3C). – araraonline Jun 11 '19 at 15:21
  • 1
    @araraonline Interesting, thanks for sharing! Will edit my answer for clarity. – cs95 Jun 11 '19 at 15:22
  • inspite of using sort=False, ignore_index=True, I don't see my row order preserved :( It seems to be alphabetically sorted in my case no matter what options I try to concat... – Sameer Mahajan Feb 10 '21 at 04:42