2

I've been searching the web for an example of this, but haven't found anything.

Let df1, df2, .. dfn be pandas dataframes, indentically indexed.

What is happening when I run the command:

pandas.concat([df1,..,dfn],axis=1,join_axes=[df1.index])

It doesn't give me an error and provides a dataframe. I've pasted all I could find in the documentation in relation to this. What happens when there is a mismatch among the indices? How does pandas know to use the indexes of the other dataframes, I thought I might have to put all the indexes of the other n-1 pandas dataframes.

Any tips?

join_axes : list of Index objects Specific indexes to use for the other n - 1 axes instead of performing inner/outer set logic

user3659451
  • 1,913
  • 9
  • 30
  • 43
  • I'm fairly certain only has 4 joints: inner, outer, left, and right. The default concat is to use 'outer'. If you pass a specific index, I'm pretty sure it does a left/right type of join. Meaning if the row is not present in the index you pass, it will insert a blank row. If the index does not contain a row, it will be removed. I think. – DataSwede Dec 10 '14 at 23:36
  • Yes, but the axis is 1, so we are dealing with adding on columns. – user3659451 Dec 11 '14 at 00:35
  • 1
    You're right. Glanced over that. Replace all of my "rows" with "columns." Are you using ipython? These sort of questions can often be answered by just playing with a sample set of data and seeing what happens – DataSwede Dec 11 '14 at 16:19

3 Answers3

7

Join_axes is deprecated. The supported way is now

pandas.concat([df1,..,dfn],axis=1).reindex(df1.columns)
3

Meaning of n

First of all, the n in n-1 refers to the number of dimensions in each of the dataframes, not the number of dataframes. You can see that from the source code at lines 938ff:

def _get_new_axes(self):
    ndim = self._get_result_dim()
    new_axes = [None] * ndim

    if self.join_axes is None:
        for i in range(ndim):
            if i == self.axis:
                continue
            new_axes[i] = self._get_comb_axis(i)
    else:
        if len(self.join_axes) != ndim - 1:
            raise AssertionError("length of join_axes must not be "
                                 "equal to {0}".format(ndim - 1))

(Therefore, it should really not read n-1 in the documentation. I guess this formulation is based on the common use example where the index passed with join_axes is that of one of the dataframes. The passed index could, though, also be a new, synthetic one.)

Use of join_axes

The actual use join_axes is to replace the indexes of the dataframes that you want to concatenate with a different one (or actually one per dimension).

In this process, the values in each dataframe are simply assigned to the new indices, ignoring the index it contains. Furthermore, if one of the dataframes is longer (in any dimension) than the corresponding index, it will simply be truncated.

Merging time series into one dataframe

What you might be trying to achieve is to combine a bunch of Series into a DataFrame and preserve their original (partially) non-matching indices.

pandas.concat([df1,..,dfn], axis=1, join='outer')

does that (with join=outer).

(However, when you want to plot the resulting dataframe, you might need to find a workaround, because all columns are interrupted by NaNs.)

Community
  • 1
  • 1
j08lue
  • 1,647
  • 2
  • 21
  • 37
0

Make join_axes=[df1.columns]:

pandas.concat([df1,..,dfn],axis=1,join_axes=[df1.columns])
Andronicus
  • 25,419
  • 17
  • 47
  • 88