0

I'm new to Panda and have neen trying to do the following, using all sorts of commands and basically getting nowhere fast. I have a csv file that has 10 columns, the first 4 columns are descriptive columns and the other 6 hold numerical values. What I need to do is create a file that has the 4 descriptive columns, a new column with the title of the numerical value in it, plus the numerical values all pasted to a single column. I don't know if I've explained that well enough, but here's a visual explanation:

Existing file:

Year Type  Account Group  gross  nett discount
1999 A     ABC      1      100    80     2%
2000 B     XYZ      2      1000   800    5%

New file:

Year  Type  Account  Group Value_Type Amount
1999   A    ABC      1      gross     100
1999   A    ABC      1      nett       80
1999   A    ABC      1      discount   2%
2000   B    XYZ      2      gross     1000
2000   B    XYZ      2      nett       800
2000   B    XYZ      2      discount    5% 

What would be the best way to achieve this?

aaossa
  • 3,763
  • 2
  • 21
  • 34
RG0107
  • 111
  • 10

1 Answers1

0

You want to pandas.melt your dataframe:

>> df = pd.melt(df, id_vars=["Year", "Type", "Account", "Group"], value_vars=["gross", "nett", "discount"])
>> df
   Year Type Account  Group  variable value
0  1999    A     ABC      1     gross   100
1  2000    B     XYZ      2     gross  1000
2  1999    A     ABC      1      nett    80
3  2000    B     XYZ      2      nett   800
4  1999    A     ABC      1  discount    2%
5  2000    B     XYZ      2  discount    5%

This function takes the identifiers columns as is (id_vars) and creates a row for each value in the value_vars columns.

EDIT: To match your output, just sort the rows:

>> print(df.sort_values(["Year", "Type", "Account", "Group"]))
   Year Type Account  Group  variable value
0  1999    A     ABC      1     gross   100
2  1999    A     ABC      1      nett    80
4  1999    A     ABC      1  discount    2%
1  2000    B     XYZ      2     gross  1000
3  2000    B     XYZ      2      nett   800
5  2000    B     XYZ      2  discount    5%
aaossa
  • 3,763
  • 2
  • 21
  • 34
  • I tried that, but the output is exactly the same as the input, am I printing the wrong dataframe? – RG0107 Feb 21 '22 at 14:50
  • Notice that `pd.melt` returns a new dataframe. You need to redefine your variable: `df = pd.melt(...)`. I've updated my answer to make it explicit – aaossa Feb 21 '22 at 14:53
  • Sorry, didn't notice that, all looking good now, thanks – RG0107 Feb 21 '22 at 14:58