19

Suppose I have two dataframes d1 and d2

d1 = pd.DataFrame(np.ones((3, 3), dtype=int), list('abc'), [0, 1, 2])
d2 = pd.DataFrame(np.zeros((3, 2), dtype=int), list('abc'), [3, 4])

d1

   0  1  2
a  1  1  1
b  1  1  1
c  1  1  1

d2

   3  4
a  0  0
b  0  0
c  0  0

What is an easy and generalized way to interweave two dataframes' columns. We can assume that the number of columns in d2 is always one less than the number of columns in d1. And, the indices are the same.

I want this:

pd.concat([d1[0], d2[3], d1[1], d2[4], d1[2]], axis=1)

   0  3  1  4  2
a  1  0  1  0  1
b  1  0  1  0  1
c  1  0  1  0  1
cs95
  • 379,657
  • 97
  • 704
  • 746
piRSquared
  • 285,575
  • 57
  • 475
  • 624

7 Answers7

20

Using pd.concat to combine the DataFrames, and toolz.interleave reorder the columns:

from toolz import interleave

pd.concat([d1, d2], axis=1)[list(interleave([d1, d2]))]

The resulting output is as expected:

   0  3  1  4  2
a  1  0  1  0  1
b  1  0  1  0  1
c  1  0  1  0  1
root
  • 32,715
  • 6
  • 74
  • 87
  • Brilliant as always... I already started using `cytoolz` on your recommendation (-: – piRSquared Jul 26 '17 at 17:04
  • 1
    This answer almost felt like cheating, since the `interleave` function is tailor-made for the operation in question. Yeah, I'm a big fan `toolz`/`cytoolz`. I remember being happy seeing you use `cytoolz` in one of your questions a little while back! – root Jul 26 '17 at 17:30
  • Hello, i'm using this solution but i'm getting duplicated columns. Any idea how can i handle? – heresthebuzz Jun 18 '20 at 22:08
7

Here's one NumPy approach -

def numpy_interweave(d1, d2):
    c1 = list(d1.columns)
    c2 = list(d2.columns)
    N = (len(c1)+len(c2))
    cols = [None]*N
    cols[::2] = c1
    cols[1::2] = c2

    out_dtype = np.result_type(d1.values.dtype, d2.values.dtype)
    out = np.empty((d1.shape[0],N),dtype=out_dtype)
    out[:,::2] = d1.values
    out[:,1::2] = d2.values

    df_out = pd.DataFrame(out, columns=cols, index=d1.index)
    return df_out

Sample run -

In [346]: d1
Out[346]: 
   x  y  z
a  6  7  4
b  3  5  6
c  4  6  2

In [347]: d2
Out[347]: 
   p  q
a  4  2
b  7  7
c  7  2

In [348]: numpy_interweave(d1, d2)
Out[348]: 
   x  p  y  q  z
a  6  4  7  2  4
b  3  7  5  7  6
c  4  7  6  2  2
Divakar
  • 218,885
  • 19
  • 262
  • 358
6

Interweave the columns:

c = np.empty((d1.columns.size + d2.columns.size,), dtype=object)
c[0::2], c[1::2] = d1.columns, d2.columns

Now, do a join and re-order with boolean indexing:

d1.join(d2)[c]

   0  3  1  4  2
a  1  0  1  0  1
b  1  0  1  0  1
c  1  0  1  0  1

You may prefer pd.concat when dealing with multiple dataframes.

cs95
  • 379,657
  • 97
  • 704
  • 746
2

write a function to abstract away the generic merge-reorder

from itertools import zip_longest
def weave(df1, df2):
  col1 = df1.columns
  col2 = df2.columns
  weaved =  [col for zipped in zip_longest(col1,col2) 
                 for col in zipped
                 if col is not None]
  return pd.concat([df1, df2], axis=1)[weaved]

weave(d1, d2)
# Output:
   0  3  1  4  2
a  1  0  1  0  1
b  1  0  1  0  1
c  1  0  1  0  1
Haleemur Ali
  • 26,718
  • 5
  • 61
  • 85
1

we can use itertools.zip_longest:

In [75]: from itertools import zip_longest

In [76]: cols = pd.Series(np.concatenate(list(zip_longest(d1.columns, d2.columns)))).dropna()

In [77]: cols
Out[77]:
0    0
1    3
2    1
3    4
4    2
dtype: object

In [78]: df = pd.concat([d1, d2], axis=1)[cols]

In [79]: df
Out[79]:
   0  3  1  4  2
a  1  0  1  0  1
b  1  0  1  0  1
c  1  0  1  0  1
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
1

My solution was to use pd.DataFrame.insert making sure to insert from the back first

df = d1.copy()
for i in range(d2.shape[1], 0, -1):
    df.insert(i, d2.columns[i - 1], d2.iloc[:, i - 1])

df

   0  3  1  4  2
a  1  0  1  0  1
b  1  0  1  0  1
c  1  0  1  0  1
piRSquared
  • 285,575
  • 57
  • 475
  • 624
1

The roundrobin itertools recipe has an interleaving characteristic. This option offers the choice between directly implementing the recipe from the Python docs, or importing a third-party package such as more_itertools that implements the recipe for you:

from more_itertools import roundrobin

pd.concat([d1, d2], axis=1)[list(roundrobin(d1, d2))]

# Output
   0  3  1  4  2
a  1  0  1  0  1
b  1  0  1  0  1
c  1  0  1  0  1

Inspired by @root's answer, the column indices are interleaved and used to slice a concatenated DataFrame.

pylang
  • 40,867
  • 14
  • 129
  • 121