0

I have a csv file loaded to python, and it has 4 columns and 50 rows. column 1 = A, column 2 = B, column 3 = C and column 4 = D. I want to use this file for classification. and for that I need only 2 columns. One for the labels and the other for the data.

So what I want is how to convert all the data so that I have something like this:

A, "some text"
A, "...."
B, "...."
C, "....."
A, "....."

and so on, instead of

A      B      C      D
"..."  "..."  "..."  "..."
"..."  "..."  "..."  "..."

Otherwise I have to do it manually, adding all the labels before every text.

John Sall
  • 1,027
  • 1
  • 12
  • 25
  • Use `df = df.melt()` – jezrael May 07 '19 at 11:22
  • @jezrael it won't give me the result I want. – John Sall May 07 '19 at 11:42
  • Ok, so try `df = df.stack().reset_index()` – jezrael May 07 '19 at 11:42
  • @jezrael that's great thanks. but it shows me "level_0" "level_1" "0" column names. I only need "level_1" and "0" columns. is it possible to do that without dropping "level_0" column? or that extra column always comes up? by the way, i couldn't find this answer in that link. I hope you make this as a new answer, it's really helpful. – John Sall May 07 '19 at 13:59
  • Unfortuantely there are always created new level in MultiIndex, solution is `df = df.stack().reset_index(level=0, drop=True).rename_axis('a').reset_index(name='b')` – jezrael May 07 '19 at 14:14
  • 1
    @jezrael thanks, worked! – John Sall May 07 '19 at 17:06

0 Answers0