-2

When I do pd.concat((df1, df2), keys=('A', 'B'), ignore_index=True) it ignores the keys. I couldn't find any mention of this in the documentation, am I missing something or is this a bug?.

code example:

import pandas as pd
import numpy as np


df1 = pd.DataFrame(np.random.uniform(0, 1, (5, 5)))
df2 = pd.DataFrame(np.random.uniform(0, 1, (5, 5)))

print(pd.concat((df1, df2), keys=('A', 'B')))
print(pd.concat((df1, df2), keys=('A', 'B'), ignore_index=True))

output:

            0         1         2         3         4
A 0  0.548398  0.285250  0.690403  0.646567  0.881671
  1  0.560004  0.111783  0.155743  0.587277  0.485484
  2  0.258623  0.243698  0.881638  0.686399  0.229254
  3  0.492586  0.324359  0.922460  0.744553  0.316212
  4  0.131956  0.693708  0.620376  0.893369  0.371382
B 0  0.633036  0.402043  0.609046  0.212024  0.988794
  1  0.383615  0.575692  0.320391  0.391028  0.589542
  2  0.326453  0.879162  0.916395  0.525230  0.532779
  3  0.273823  0.229596  0.326523  0.989329  0.340129
  4  0.152274  0.445670  0.133162  0.112688  0.572573
          0         1         2         3         4
0  0.548398  0.285250  0.690403  0.646567  0.881671
1  0.560004  0.111783  0.155743  0.587277  0.485484
2  0.258623  0.243698  0.881638  0.686399  0.229254
3  0.492586  0.324359  0.922460  0.744553  0.316212
4  0.131956  0.693708  0.620376  0.893369  0.371382
5  0.633036  0.402043  0.609046  0.212024  0.988794
6  0.383615  0.575692  0.320391  0.391028  0.589542
7  0.326453  0.879162  0.916395  0.525230  0.532779
8  0.273823  0.229596  0.326523  0.989329  0.340129
9  0.152274  0.445670  0.133162  0.112688  0.572573

EDIT:

python version = 3.9.0.final.0

pandas version = 1.2.3

EDIT:

To be clear what I was expecting is:

            0         1         2         3         4
A 0  0.548398  0.285250  0.690403  0.646567  0.881671
  1  0.560004  0.111783  0.155743  0.587277  0.485484
  2  0.258623  0.243698  0.881638  0.686399  0.229254
  3  0.492586  0.324359  0.922460  0.744553  0.316212
  4  0.131956  0.693708  0.620376  0.893369  0.371382
B 5  0.633036  0.402043  0.609046  0.212024  0.988794
  6  0.383615  0.575692  0.320391  0.391028  0.589542
  7  0.326453  0.879162  0.916395  0.525230  0.532779
  8  0.273823  0.229596  0.326523  0.989329  0.340129
  9  0.152274  0.445670  0.133162  0.112688  0.572573
Vinzent
  • 1,070
  • 1
  • 9
  • 14
  • What output were you expecting? – Henry Ecker Jun 23 '21 at 15:29
  • @HenryEcker I was expecting to get the same as when I have ignore_index=False except that level 1 of the resulting multiindex should have new consecutive numbers from 0 to n. – Vinzent Jun 23 '21 at 15:33
  • The keys are your index. You're passing an argument to ignore the index. That's exactly the output I would expect. – Abstract Jun 23 '21 at 15:34
  • @Abstract keys are a new level of the new index, I was expecting the old index to be ignored not the new index. – Vinzent Jun 23 '21 at 15:35
  • it can help : https://stackoverflow.com/questions/49620538/what-are-the-levels-keys-and-names-arguments-for-in-pandas-concat-functio – Abhishek Jun 23 '21 at 15:42

1 Answers1

1

You might need to add a reset_index() after concatenatinng

print(pd.concat((df1, df2), keys=('A', 'B'), ignore_index=False).reset_index(drop=True, inplace=True))
nizarhamood
  • 91
  • 10
  • 2
    Please explain your answer a bit and let OP know how this is going to help OP, and if possible please add the sample output as well. – ThePyGuy Jun 23 '21 at 15:52
  • Thank you for your answer but I already knew that I could do this I just didn't think that it would be necessary as I thought that just specifying keys and `ignore_index=True´ should do the job. My question was not "how do I achieve my desired result" but rather "why doesn't pandas do as I was expecting" and "is it a bug or the intended behavior". – Vinzent Jun 23 '21 at 16:37