I have a large dataframe (50+ total columns) that has a "Project_Type" column with 5 different types of projects available. The projects types can be "Project Type 1", "Project Type 2", "Project Type 3", "Project Type 4", or "Project Type 5". I have other columns with various performance measures (all integers) so I believe I need to normalize each "Project_Type" in a new column to be either 1 (if true) or 0 (if false) and then I can run .corr() over the project types and performance measures to see if there are any correlations (such as certain project types costing more, making more of an impact, etc)
I can create 5 new blank columns manually doing
df['Proj1Normalize'] = ""
df['Proj2Normalize'] = ""
etc...
and then get a value of 1 or 0 based on true or false, but is there a quicker way to add a large list of blank columns at once that have specific titles? This example is easy to do manually, but I have run into problems where I need to make 20+ new "normalized" columns at once and it is too time consuming to manually create them all.
It would also help if someone could explain an efficient way to normalize one column with multiple different values at once.
I tried df['Proj1Normalize', 'Proj2Normalize', 'Proj3Normalize, etc] = ""
but that wouldn't work.
I tried referring to this - Add multiple empty columns to pandas DataFrame - but i dont want my columns to just be names one character names as in the first example.
Example:
Right now I have:
ProjectType Dollars_Spent Employees
0 Proj 1 1000 10
1 Proj 2 1800 12
2 Proj 1 800 14
3 Proj 3 980 5
and i want to have:
ProjectType Dollars_Spent Employees Proj1 Proj 2 Proj3
0 Proj 1 1000 10 1 0 0
1 Proj 2 1800 12 0 1 0
2 Proj 1 800 14 1 0 0
3 Proj 3 980 5 0 0 1
Any help would be great.