This is a sample what my dataframe looks like:
company_name country_code state_code software finance commerce etc......
google USA CA 1 0 0
jimmy GBR unknown 0 0 1
microsoft USA NY 1 0 0
I want to get the average number of each industry in each state for example: I could have that 14% of the industry in CA is in software, 15% of the industry in CA is healthcare etc...
Obviously I need to get the total number of companies across all industries in each state and divide the number of companies in each individual industry by this to get the percentage of each industry in each state.
I just can't figure out a functioning way to do this.
Obviously I have tried using something like this in different ways, but to no avail:
new_df = df['state_code'].value_counts(normalize=True)
I want to get the sum of all the columns software, finance, commerce etc... and then give the percentage of each column when compared to the other columns.
Expected output:
State_Code software finance commerce etc.....
CA 20% 10% 5% 65%
NY 10% 20% 10% 60%
AH 5% 5% 20% 70%