1

Here is the dataset I have:

Employee Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Year Index Program Status Bid Category Account
E_123456789 #missing #missing #missing #missing 1 1 1 1 1 1 1 1 FY22 I_1234 NA Entity Final Janitor 1 Overhead
E_123456799 #missing 0 0 0 0 0 0 .5 .9 .1 0 0 FY22 I_1255 P12345123 Extended Cook 3 Utilization

This is what I'd like to have:

Employee Month Year Index Program Status Bid Category Account Percentage
E_123456789 Jul FY22 I_1234 NA Entity Final Janitor 1 Overhead #missing
E_123456789 Aug FY22 I_1234 NA Entity Final Janitor 1 Overhead #missing
E_123456789 Sep FY22 I_1234 NA Entity Final Janitor 1 Overhead #missing
E_123456789 Oct FY22 I_1234 NA Entity Final Janitor 1 Overhead #missing
E_123456789 Nov FY22 I_1234 NA Entity Final Janitor 1 Overhead 1
E_123456789 Dec FY22 I_1234 NA Entity Final Janitor 1 Overhead 1
E_123456789 Jan FY22 I_1234 NA Entity Final Janitor 1 Overhead 1
E_123456789 Feb FY22 I_1234 NA Entity Final Janitor 1 Overhead 1
E_123456789 Mar FY22 I_1234 NA Entity Final Janitor 1 Overhead 1
E_123456789 Apr FY22 I_1234 NA Entity Final Janitor 1 Overhead 1
E_123456789 May FY22 I_1234 NA Entity Final Janitor 1 Overhead 1
E_123456789 Jun FY22 I_1234 NA Entity Final Janitor 1 Overhead 1
E_123456799 Jul FY22 I_1255 P12345123 Extended Cook 3 Utilization #missing
E_123456799 Aug FY22 I_1255 P12345123 Extended Cook 3 Utilization 0
E_123456799 Sep FY22 I_1255 P12345123 Extended Cook 3 Utilization 0
E_123456799 Oct FY22 I_1255 P12345123 Extended Cook 3 Utilization 0
E_123456799 Nov FY22 I_1255 P12345123 Extended Cook 3 Utilization 0
E_123456799 Dec FY22 I_1255 P12345123 Extended Cook 3 Utilization 0
E_123456799 Jan FY22 I_1255 P12345123 Extended Cook 3 Utilization 0
E_123456799 Feb FY22 I_1255 P12345123 Extended Cook 3 Utilization .5
E_123456799 Mar FY22 I_1255 P12345123 Extended Cook 3 Utilization .9
E_123456799 Apr FY22 I_1255 P12345123 Extended Cook 3 Utilization .1
E_123456799 May FY22 I_1255 P12345123 Extended Cook 3 Utilization 0
E_123456799 Jun FY22 I_1255 P12345123 Extended Cook 3 Utilization 0

I found steps that seemed to do the reverse of this (convert the "Month" column into Jul/Aug/Sep/etc) but I can't find anything related to this. I am very new to Python (read: a couple weeks of use) and am sure there is a relatively easy way to do this that I just don't know.

Here is my file:

import pandas as pd

data = pd.read_csv('SMART_Export_Test.csv')

Using pivot does the opposite of what I want (it spreads the values in the Jul column into new column names):

data.pivot(columns = 'Jul', values='Jul')

Any help/direction would be greatly appreciated!

not_speshal
  • 22,093
  • 2
  • 15
  • 30

1 Answers1

1

Try with melt:

>>> df.melt(["Employee", "Year", "Index", "Program", "Status", "Bid Category", "Account"], 
            ["Jan", "Feb", "Mar", "Apr", "May", "Jun", 
             "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"],
            "Percentage")

       Employee  Year   Index  ...      Account Percentage     value
0   E_123456789  FY22  I_1234  ...     Overhead        Jan         1
1   E_123456799  FY22  I_1255  ...  Utilization        Jan         0
2   E_123456789  FY22  I_1234  ...     Overhead        Feb       1.0
3   E_123456799  FY22  I_1255  ...  Utilization        Feb       0.5
4   E_123456789  FY22  I_1234  ...     Overhead        Mar       1.0
5   E_123456799  FY22  I_1255  ...  Utilization        Mar       0.9
6   E_123456789  FY22  I_1234  ...     Overhead        Apr       1.0
7   E_123456799  FY22  I_1255  ...  Utilization        Apr       0.1
8   E_123456789  FY22  I_1234  ...     Overhead        May         1
9   E_123456799  FY22  I_1255  ...  Utilization        May         0
10  E_123456789  FY22  I_1234  ...     Overhead        Jun         1
11  E_123456799  FY22  I_1255  ...  Utilization        Jun         0
12  E_123456789  FY22  I_1234  ...     Overhead        Jul  #missing
13  E_123456799  FY22  I_1255  ...  Utilization        Jul  #missing
14  E_123456789  FY22  I_1234  ...     Overhead        Aug  #missing
15  E_123456799  FY22  I_1255  ...  Utilization        Aug         0
16  E_123456789  FY22  I_1234  ...     Overhead        Sep  #missing
17  E_123456799  FY22  I_1255  ...  Utilization        Sep         0
18  E_123456789  FY22  I_1234  ...     Overhead        Oct  #missing
19  E_123456799  FY22  I_1255  ...  Utilization        Oct         0
20  E_123456789  FY22  I_1234  ...     Overhead        Nov         1
21  E_123456799  FY22  I_1255  ...  Utilization        Nov         0
22  E_123456789  FY22  I_1234  ...     Overhead        Dec         1
23  E_123456799  FY22  I_1255  ...  Utilization        Dec         0

[24 rows x 9 columns]
not_speshal
  • 22,093
  • 2
  • 15
  • 30
  • That was it! And even simpler than I anticipated. Thank you! And also thank you for editing the question so the tables showed up correctly. – Wesley Geddes Dec 03 '21 at 17:57
  • (This thread should probably be closed as a duplicate) – BigBen Dec 03 '21 at 18:55
  • Why not just a close vote instead of a comment then? – not_speshal Dec 03 '21 at 18:58
  • (Perhaps a prompt to consider searching for a duplicate first before writing an answer, optionally leaving a comment to OP when voting to close) – BigBen Dec 03 '21 at 19:20
  • I spent many hours looking for the solution to this problem. It is difficult to search for a 'duplicate' question when I don't know exactly what to search for (hence my asking of the question). – Wesley Geddes Dec 04 '21 at 20:07