The following works:
df1 = pd.DataFrame([[1, 2, 3]], columns=['a', 'b', 'c'])
df2 = pd.DataFrame(columns=['a', 'b', 'c'])
pd.concat([df1, df2])
as of course it should. However, the following ought to be exactly the same, and yet it doesn't work:
od3 = OrderedDict([('a', [1]), ('b', [2]), ('c', [3])])
od4 = OrderedDict([('a', []), ('b', []), ('c', [])])
df3 = pd.DataFrame(od3)
df4 = pd.DataFrame(od4)
pd.concat([df3, df4])
This block of code produces
ValueError: Shape of passed values is (3, 1), indices imply (3, 0)
Oddly, these all do work:
pd.concat([df3.drop_duplicates(), df4.drop_duplicates()])
pd.concat([df3, df4.drop_duplicates()])
pd.concat([df3.drop_duplicates(), df4])
although these result in the dataframe having float64
s instead of int64
s.
So what's going on? It seems that Pandas stores df3
differently from df1
even though they appear identical, and the .drop_duplicates()
method converts df3
to a canonical form.(?) Any thoughts?