1

Let's say I have data that I imported as a .csv:

T_1        T_2    A          B          C      
0           1   Apple      Banana     Orange
1           2   Book        Pen       Pencil
2           3   Blue        Red       Green

And I want it to stacked into one column like this:

Apple
Banana
Orange
Book
Pen
Pencil
Blue
Red
Green

No headings or anything, I just want the data to all be in one column. How would I go about doing that?

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • check out `df.melt` – Ch3steR Dec 27 '20 at 04:29
  • Many ways to accomplish this... `df[['A','B','C']].melt().iloc[:,-1]` OR `pd.Series(df[['A','B','C']].values.flatten())` OR `pd.Series(df[['A','B','C']].to_numpy().flatten())` OR `pd.concat([df['A'],df['B'],df['C']])` OR `pd.concat([df['A'],df['B'],df['C']]).reset_index(drop=True)` – David Erickson Dec 27 '20 at 04:42

3 Answers3

1

Apart from the answers from David, You can use the stack().

>>> dta= pd.DataFrame({"T_1":[0,1,2], "T_2":[1,2,3],"A":["Apple","Book","Blue"], "B":["Banana","Pen","Red"], "C": ["Orange","Pens", "Green"]})

>>> dta.head()

T_1 T_2 A   B   C
0   0   1   Apple   Banana  Orange
1   1   2   Book    Pen Pens
2   2   3   Blue    Red Green

>>> dta[["A","B","C"]].stack()

0  A     Apple
   B    Banana
   C    Orange
1  A      Book
   B       Pen
   C      Pens
2  A      Blue
   B       Red
   C     Green

teddcp
  • 1,514
  • 2
  • 11
  • 25
0

You may use melt here along with a transpose operation:

df = pd.DataFrame({"T_1": [0, 1, 2], "T_2": [1, 2, 3], "A": ["Apple", "Book", "Blue"], "B": ["Banana", "Pen", "Red"], "C": ["Orange", "Pencil", "Green"]})
df = df[["A", "B", "C"]].T
df.columns = ["Fruit", "Item", "Color"]
df_out = pd.melt(df, id_vars=None, value_vars=["Fruit", "Item", "Color"], value_name='output')["output"]
print(df_out)

This prints:

0     Apple
1    Banana
2    Orange
3      Book
4       Pen
5    Pencil
6      Blue
7       Red
8     Green
Name: output, dtype: object
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
0

Some different options. melt is the best syntax but could be 2x slower than other options on larger datasets (e.g. one million rows):

df = pd.DataFrame({'A' : pd.util.testing.rands_array(10, 1000000),
                   'B' : pd.util.testing.rands_array(10, 1000000),
                   'C' : pd.util.testing.rands_array(10, 1000000)})
%timeit pd.Series(df[['A','B','C']].to_numpy().flatten())
%timeit pd.Series(df[['A','B','C']].values.flatten())
%timeit df[['A','B','C']].melt().iloc[:,-1]
%timeit pd.concat([df['A'],df['B'],df['C']]).reset_index(drop=True)
%timeit df['A'].append(df['B']).append(df['C']).reset_index(drop=True)

Times (Edited):

144 ms ± 4.81 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
144 ms ± 4.37 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
274 ms ± 9.26 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
143 ms ± 3.61 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
180 ms ± 3.01 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
David Erickson
  • 16,433
  • 2
  • 19
  • 35