How come pd.concat(axis=1) is row-binding instead of column binding?

Question

I am using pandas and I have two dataframes which have the same number of rows:

All I want to do is to merge them together by ticker, which I tried with

a_no_nan2_last.join(ref, on = "ticker")

which gave me this error

ValueError: You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat

So I checked the types of the column ticker in both and I checked that ticker is the only column in overlaps

set(a_no_nan2_last.columns).intersection(set(res.columns))

which gives

{'ticker'}

and

type(a_no_nan2_last["ticker"].iloc[0])
<class 'str'>
type(res["ticker"].iloc[0])
<class 'str'>

the data looks like

a_no_nan2_last
       ticker       Date   Close  Close_lead1
2141      1AD 2018-08-14  0.2845        0.160
5354      1AG 2018-08-14  0.0450        0.036
75158     1AL 2018-08-14  0.9287        0.800
188       1ST 2018-08-14  0.0370        0.079
81195     3DP 2018-08-14  0.0320        0.041
       ...        ...     ...          ...
111688    ZMI 2018-08-14  0.1200        0.095
111310    ZNC 2018-08-14  0.1650        0.078
111877    ZNO 2018-08-13  0.1150        0.079
111373    ZTA 2018-08-10  0.0700        0.070
111940    ZYB 2018-08-09  0.0070        0.014
[1792 rows x 4 columns]

and

res
variable ticker  ...  Close__variance_larger_than_standard_deviation
0           1AD  ...                                             0.0
1           1AG  ...                                             0.0
2           1AL  ...                                             0.0
3           1ST  ...                                             0.0
4           3DP  ...                                             0.0
         ...  ...                                             ...
1787        ZMI  ...                                             0.0
1788        ZNC  ...                                             0.0
1789        ZNO  ...                                             0.0
1790        ZTA  ...                                             0.0
1791        ZYB  ...                                             0.0
[1792 rows x 795 columns]

so I tried concat, which appends by row not by column giving me the wrong result

pd.concat([a_no_nan2_last, res], axis=1)
       ticker  ... Close__variance_larger_than_standard_deviation
0         NaN  ...                                            0.0
1         NaN  ...                                            0.0
2         NaN  ...                                            0.0
3         NaN  ...                                            0.0
4         NaN  ...                                            0.0
       ...  ...                                            ...
111688    ZMI  ...                                            NaN
111751    Z1P  ...                                            NaN
111814    ZIP  ...                                            NaN
111877    ZNO  ...                                            NaN
111940    ZYB  ...                                            NaN
[3556 rows x 799 columns]

I expected my output to be 1792 rows long.

Dropping the common column ticker also doesn't help, see

res3 = res.drop("ticker", axis=1)
pd.concat([a_no_nan2_last, res3], axis=1)

       ticker  ... Close__variance_larger_than_standard_deviation
0         NaN  ...                                            0.0
1         NaN  ...                                            0.0
2         NaN  ...                                            0.0
3         NaN  ...                                            0.0
4         NaN  ...                                            0.0
       ...  ...                                            ...
111688    ZMI  ...                                            NaN
111751    Z1P  ...                                            NaN
111814    ZIP  ...                                            NaN
111877    ZNO  ...                                            NaN
111940    ZYB  ...                                            NaN
[3556 rows x 798 columns]

https://stackoverflow.com/questions/22149584/what-does-axis-in-pandas-mean , https://stackoverflow.com/questions/25773245/ambiguity-in-pandas-dataframe-numpy-array-axis-definition — BENY, Aug 13 '19 at 01:37

score 0 · Accepted Answer · answered Aug 13 '19 at 02:43

0

You need to reset_index(drop=True) e.g.

res1 = pd.concat([a_no_nan2_last.reset_index(drop=True), res.reset_index(drop=True)], axis=1)

answered Aug 13 '19 at 02:43

xiaodai

14,889
18
76
140

How come pd.concat(axis=1) is row-binding instead of column binding?

1 Answers1