How to mix two pandas columns into one dataframe with first element from first column, second element from second column and so on?

Question

Imagine I have pandas dataframe:

Column1 Column2

A            D

B            E

C            F

How to get resulting Dataframe in this form?

Column

 A
 D
 B
 E
 C
 F

have you tried df.values.flatten() and then reshaping it? it returns a numpy array but you can turn that back into a dataframe if you want. Relevant answer here: https://stackoverflow.com/questions/25440008/python-pandas-flatten-a-dataframe-to-a-list — Khepu, Nov 17 '20 at 13:59

Rivers · Accepted Answer · 2020-11-17T15:52:02.750

EDIT: see the benchmark below for a slightly faster solution.

You can do this:

# Import pandas library 
import pandas as pd

# The data
data = [["A", "D"], ["B", "E"], ["C", "F"]]

# Create DataFrame
df = pd.DataFrame(data, columns = ["Column1", "Column2"]) 

# Flatten and convert to DataFrame
new_df = pd.DataFrame(df.to_numpy().flatten())

print(df)

Output:

A
D
B
E
C
F

new_df will be a pandas.DataFrame.

Note the use of df.to_numpy() too.

And as suggested by @Michael Szczesny you can do:

new_series = df.stack().reset_index(drop=True)

Which wil return a pandas.Series.

Addded Benchmark:

Based on @Mayank Porwal's answer I add this benchmark results. I used timeit.repeat with repeat = 7, number = 10000. Sorted from fastest to slowest:

new_df = pd.DataFrame(df.to_numpy().ravel('A')) # 51.0 µs
new_df = pd.DataFrame(df.to_numpy().ravel('K')) # 51.0 µs
new_df = pd.DataFrame(df.to_numpy().ravel('F')) # 51.1 µs
new_df = pd.DataFrame(df.to_numpy().flatten())  # 52.6 µs
new_df = pd.DataFrame(df.to_numpy().ravel('C')) # 53.4 µs
new_series = df.stack().reset_index(drop=True)  # 322.0 µs

Using numpy.ravel is fastest mainly because it returns a view whereas numpy..to_numpy() returns a copy. For details about numpy.ravel see: https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.ravel.html

In short, "A" will force to read the elements in Fortran-like index order if the array is Fortran contiguous in memory and with "K" it will read the elements in the order they occur in memory.

Mayank Porwal · Answer 2 · 2020-11-18T07:20:09.340

3

Use df.to_numpy with numpy.ravel:

In [2349]: x = pd.DataFrame(df.to_numpy().ravel('F'))

In [2350]: x
Out[2350]: 
     0
0    A
1    B
2    C
3    D
4    E
5    F
dtype: object

Note: This will be quite performant.

Timing comparisons:

In [2369]: dd = pd.concat([df] * 1000)

# Rivers' answers:

In [2369]: %timeit pd.DataFrame(dd.to_numpy().flatten())
95.6 µs ± 1.55 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [2371]: %timeit dd.stack().reset_index(drop=True)
919 µs ± 9.95 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

# My answer:

In [2372]: %timeit pd.DataFrame(dd.to_numpy().ravel('F'))
62 µs ± 577 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

edited Nov 18 '20 at 07:20

answered Nov 17 '20 at 14:25

Mayank Porwal

33,470
8
37
58

@Augustas please check my answer. It has the best performance. – Mayank Porwal Nov 17 '20 at 14:49
1

I didn't thought speed performance could be important for this task, good idea, thanks, I'll edit my answer. – Rivers Nov 17 '20 at 15:38

How to mix two pandas columns into one dataframe with first element from first column, second element from second column and so on?

2 Answers2

Related