2

I have this dataframe:

pd.DataFrame({"X": [1,2,3,4],
                   "Y": [5,6,7,8],
                   "Z": [9,10,11,12]})

enter image description here

And I'm looking for this output:

enter image description here

Currently, the similar problems solved I have found are the opposite: looking from series to dataframe. The most similar I have found is this one, which isn't similar at all. I have tried also with pivot_table() and reshape(), but they require an index column where I'm just looking for one column.

Any suggestions?

PS: You can assume that the dataframe has 100 columns to avoid selecting them one by one, but you call them as they are ordered (e.g. if they are 100 columns, you can do X1:X100)

Chris
  • 2,019
  • 5
  • 22
  • 67

4 Answers4

7

Use flattening with ravel('F') -

In [14]: pd.Series(df.to_numpy(copy=False).ravel('F'))
Out[14]: 
0      1
1      2
2      3
3      4
4      5
5      6
6      7
7      8
8      9
9     10
10    11
11    12
dtype: int64

This series is a view into the input dataframe, which means virtually free runtime and zero memory overhead. Let's verify -

In [20]: s = pd.Series(df.to_numpy(copy=False).ravel('F'))

In [21]: np.shares_memory(s,df)
Out[21]: True

Let's confirm the timings too -

In [2]: df = pd.DataFrame(np.random.rand(100000,3), columns=['X','Y','Z'])

In [3]: %timeit pd.Series(df.to_numpy(copy=False).ravel('F'))
579 µs ± 9.09 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Divakar
  • 218,885
  • 19
  • 262
  • 358
5

This is melt:

df.melt()[['value']]

Output:

    value
0       1
1       2
2       3
3       4
4       5
5       6
6       7
7       8
8       9
9      10
10     11
11     12
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
4

One way is to reshape the data from the "wide" to the "tall" format by stacking:

df.T.stack().reset_index(drop=True)
#0      1
#1      2
#2      3
#3      4
#4      5
#5      6
#6      7
#7      8
#8      9
#9     10
#10    11
#11    12
DYZ
  • 55,249
  • 10
  • 64
  • 93
0

As always, there are many ways to "skin a cat" in Pandas, and then performance may become the criterion. This is a meta-answer that compares the performance:

  1. ravel by Divakar: 80 us
  2. stack by DYZ: 640 us
  3. melt by Quang Hoang: 2.03 ms
DYZ
  • 55,249
  • 10
  • 64
  • 93