0

I apologize in advance for my inability to explain what I am trying to do into words. I am very confused. I have a data set where I have corporate environmental impact data and I have created a column with their impact category - if they have a positive value in the total environmental impact column, their category is 'positive', and if they have negative value, their category is 'negative'. I am looking at companies by country. I have no issues with this if the country has companies in both categories, but if a country does not have any positive countries, I have a problem making the graph I am trying to make.

Here is the working code for a country which has both positive and negative categories:

usa_company_impcat = pd.crosstab(usa_company_filtered['Year'], usa_company_filtered['Impact_Category'])
usa_company_impcat['Total_Count'] = usa_company_impcat.loc[:,['Negative', 'Positive']].sum(axis = 1) # adding a total column
usa_company_impcat = usa_company_impcat.rename_axis("Year").reset_index() # fixing the year column
usa_company_impcat

Here is the output I get

enter image description here

If I try to do this with a country who only has rows that are negative, I get this error: KeyError: "['Positive'] not in index"

Is there a simple way to fix this? Should I just give up on those countries?

SOLVED: I figured out the simplest way to achieve what I wanted was to add a 'Positive' column populated with zeros. I apologize that my original question wasn't better worded. Here is the fixed code for one of the countries who had no rows possessing 'Positive'.

ROK_ind_impcat = pd.crosstab(ROK_industry['Year'], ROK_industry['Impact_Category'])
ROK_ind_impcat['Positive'] = 0 # this is the line I added to replicate a Positive column for my later graph
ROK_ind_impcat['Total_Count'] = ROK_ind_impcat.loc[:,['Negative', 'Positive']].sum(axis = 1)
ROK_ind_impcat = ROK_ind_impcat.rename_axis("Year").reset_index() # fixing the year column
ROK_ind_impcat
  • You might get better answers if you tag this question with pandas/numpy. – John Gordon Aug 26 '23 at 17:32
  • Refrain from showing your dataframe as an image. Your question needs a minimal reproducible example consisting of sample input, expected output, actual output, and only the relevant code necessary to reproduce the problem. See [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) for best practices related to Pandas questions. – itprorh66 Aug 26 '23 at 17:35
  • 1
    You probably want to check whether `'Positive'` is contained in the list of columns, and if not, add a `Positive` column with all zeros. That's only about 3 lines of code. – Tim Roberts Aug 26 '23 at 17:35
  • can you share your data/ a sample of your data too? that would be better. – Musabbir Arrafi Aug 26 '23 at 17:40
  • Please provide enough code so others can better understand or reproduce the problem. – Community Aug 27 '23 at 08:17

2 Answers2

0

Here's how to add a column that is not present:

import pandas as pd

df = pd.DataFrame( {'a': [1,2,3], 'b':[4,5,6]} )
print(df)
print('After')
if 'c' not in df.columns:
    df['c'] = 0
print(df)

Output:

   a  b
0  1  4
1  2  5
2  3  6
After
   a  b  c
0  1  4  0
1  2  5  0
2  3  6  0
Tim Roberts
  • 48,973
  • 4
  • 21
  • 30
0

I figured out the simplest way to achieve what I wanted was to add a 'Positive' column populated with zeros. I apologize that my original question wasn't better worded. Here is the fixed code for one of the countries who had no rows possessing 'Positive'.

ROK_ind_impcat = pd.crosstab(ROK_industry['Year'], ROK_industry['Impact_Category'])
ROK_ind_impcat['Positive'] = 0 # this is the line I added to replicate a Positive column for my later graph
ROK_ind_impcat['Total_Count'] = ROK_ind_impcat.loc[:,['Negative', 'Positive']].sum(axis = 1)
ROK_ind_impcat = ROK_ind_impcat.rename_axis("Year").reset_index() # fixing the year column
ROK_ind_impcat