Interweave two dataframes

Question

Suppose I have two dataframes d1 and d2

d1 = pd.DataFrame(np.ones((3, 3), dtype=int), list('abc'), [0, 1, 2])
d2 = pd.DataFrame(np.zeros((3, 2), dtype=int), list('abc'), [3, 4])

What is an easy and generalized way to interweave two dataframes' columns. We can assume that the number of columns in d2 is always one less than the number of columns in d1. And, the indices are the same.

I want this:

pd.concat([d1[0], d2[3], d1[1], d2[4], d1[2]], axis=1)

   0  3  1  4  2
a  1  0  1  0  1
b  1  0  1  0  1
c  1  0  1  0  1

score 20 · Accepted Answer · answered Jul 26 '17 at 16:55

20

Using pd.concat to combine the DataFrames, and toolz.interleave reorder the columns:

from toolz import interleave

pd.concat([d1, d2], axis=1)[list(interleave([d1, d2]))]

The resulting output is as expected:

   0  3  1  4  2
a  1  0  1  0  1
b  1  0  1  0  1
c  1  0  1  0  1

answered Jul 26 '17 at 16:55

root

32,715
6
74
87

Brilliant as always... I already started using `cytoolz` on your recommendation (-: – piRSquared Jul 26 '17 at 17:04
1

This answer almost felt like cheating, since the `interleave` function is tailor-made for the operation in question. Yeah, I'm a big fan `toolz`/`cytoolz`. I remember being happy seeing you use `cytoolz` in one of your questions a little while back! – root Jul 26 '17 at 17:30
Hello, i'm using this solution but i'm getting duplicated columns. Any idea how can i handle? – heresthebuzz Jun 18 '20 at 22:08

Divakar · Answer 2 · 2017-07-26T17:08:43.987

Here's one NumPy approach -

def numpy_interweave(d1, d2):
    c1 = list(d1.columns)
    c2 = list(d2.columns)
    N = (len(c1)+len(c2))
    cols = [None]*N
    cols[::2] = c1
    cols[1::2] = c2

    out_dtype = np.result_type(d1.values.dtype, d2.values.dtype)
    out = np.empty((d1.shape[0],N),dtype=out_dtype)
    out[:,::2] = d1.values
    out[:,1::2] = d2.values

    df_out = pd.DataFrame(out, columns=cols, index=d1.index)
    return df_out

Sample run -

In [346]: d1
Out[346]: 
   x  y  z
a  6  7  4
b  3  5  6
c  4  6  2

In [347]: d2
Out[347]: 
   p  q
a  4  2
b  7  7
c  7  2

In [348]: numpy_interweave(d1, d2)
Out[348]: 
   x  p  y  q  z
a  6  4  7  2  4
b  3  7  5  7  6
c  4  7  6  2  2

Thanks! This method worked well in our project with Pandas 1.0.5 and Python 3.8. — pauljohn32, Jul 29 '20 at 14:08

cs95 · Answer 3 · 2017-07-26T16:56:59.100

6

Interweave the columns:

c = np.empty((d1.columns.size + d2.columns.size,), dtype=object)
c[0::2], c[1::2] = d1.columns, d2.columns

Now, do a join and re-order with boolean indexing:

d1.join(d2)[c]

   0  3  1  4  2
a  1  0  1  0  1
b  1  0  1  0  1
c  1  0  1  0  1

You may prefer pd.concat when dealing with multiple dataframes.

edited Jul 26 '17 at 16:56

answered Jul 26 '17 at 16:51

cs95

379,657
97
704
746

Haleemur Ali · Answer 4 · 2017-07-26T17:04:34.850

write a function to abstract away the generic merge-reorder

from itertools import zip_longest
def weave(df1, df2):
  col1 = df1.columns
  col2 = df2.columns
  weaved =  [col for zipped in zip_longest(col1,col2) 
                 for col in zipped
                 if col is not None]
  return pd.concat([df1, df2], axis=1)[weaved]

weave(d1, d2)
# Output:
   0  3  1  4  2
a  1  0  1  0  1
b  1  0  1  0  1
c  1  0  1  0  1

score 1 · Answer 5 · answered Jul 26 '17 at 17:04

we can use itertools.zip_longest:

In [75]: from itertools import zip_longest

In [76]: cols = pd.Series(np.concatenate(list(zip_longest(d1.columns, d2.columns)))).dropna()

In [77]: cols
Out[77]:
0    0
1    3
2    1
3    4
4    2
dtype: object

In [78]: df = pd.concat([d1, d2], axis=1)[cols]

In [79]: df
Out[79]:
   0  3  1  4  2
a  1  0  1  0  1
b  1  0  1  0  1
c  1  0  1  0  1

score 1 · Answer 6 · answered Jul 26 '17 at 17:18

My solution was to use pd.DataFrame.insert making sure to insert from the back first

df = d1.copy()
for i in range(d2.shape[1], 0, -1):
    df.insert(i, d2.columns[i - 1], d2.iloc[:, i - 1])

df

   0  3  1  4  2
a  1  0  1  0  1
b  1  0  1  0  1
c  1  0  1  0  1

pylang · Answer 7 · 2017-07-29T02:41:20.573

The roundrobin itertools recipe has an interleaving characteristic. This option offers the choice between directly implementing the recipe from the Python docs, or importing a third-party package such as more_itertools that implements the recipe for you:

from more_itertools import roundrobin

pd.concat([d1, d2], axis=1)[list(roundrobin(d1, d2))]

# Output
   0  3  1  4  2
a  1  0  1  0  1
b  1  0  1  0  1
c  1  0  1  0  1

Inspired by @root's answer, the column indices are interleaved and used to slice a concatenated DataFrame.

Interweave two dataframes

7 Answers7

Linked