1

The following works:

df1 = pd.DataFrame([[1, 2, 3]], columns=['a', 'b', 'c'])
df2 = pd.DataFrame(columns=['a', 'b', 'c'])
pd.concat([df1, df2])

as of course it should. However, the following ought to be exactly the same, and yet it doesn't work:

od3 = OrderedDict([('a', [1]), ('b', [2]), ('c', [3])])
od4 = OrderedDict([('a', []), ('b', []), ('c', [])])
df3 = pd.DataFrame(od3)
df4 = pd.DataFrame(od4)
pd.concat([df3, df4])

This block of code produces

ValueError: Shape of passed values is (3, 1), indices imply (3, 0)

Oddly, these all do work:

pd.concat([df3.drop_duplicates(), df4.drop_duplicates()])
pd.concat([df3, df4.drop_duplicates()])
pd.concat([df3.drop_duplicates(), df4])

although these result in the dataframe having float64s instead of int64s.

So what's going on? It seems that Pandas stores df3 differently from df1 even though they appear identical, and the .drop_duplicates() method converts df3 to a canonical form.(?) Any thoughts?

dslack
  • 835
  • 6
  • 17
  • 1
    `df2 = pd.DataFrame([[]], columns=['a', 'b', 'c'])` gives an error for me: `AssertionError: 3 columns passed, passed data had 0 columns`. Maybe specifify which versions you are using... – Julien Nov 09 '17 at 05:58
  • 2
    I'm using pandas 0.20.3 on Python 3, and can initialize `df2` if I use an empty list for the data. I'm guessing the variables defined as `od3` and `od4` are meant to be `od1` and `od2` respectively, but after making this change, the code runs fine. – Ken Wei Nov 09 '17 at 06:15
  • 1
    What are `od1` and `od2`? There are not shown. – Mike Müller Nov 09 '17 at 06:19
  • cannot reproduce – juanpa.arrivillaga Nov 09 '17 at 06:20
  • "float64s instead of int64s." because the result has NaN. https://stackoverflow.com/a/21290084/1240268 – Andy Hayden Nov 09 '17 at 06:21
  • Ugh, sorry all. Had a few typos, which have now been fixed. Specifically, I referred to `od1` and `od2` when I should have referred to `od3` and `od4`, and I had `pd.DataFrame([[]], columns=['a', 'b', 'c'])` when I should have had `pd.DataFrame(columns=['a', 'b', 'c'])`. The code above now runs correctly. – dslack Nov 09 '17 at 20:00

0 Answers0