0

I am working on a database where there are 6 columns: Economy, ADB Code, Year, Sector, Employment index, VA index. In rows, there are all the countries with values in corresponding years. The problem is that I need to create a new row sector, which will consist of a sum of other sectors. Here's an example in excel

I have tried calculating values for the indexes, and I think that it was successful. Here's the code:

construction = df.query("Sector == 'Construction'")
manufacturing = df.query("Sector == 'Manufacturing'")
mining_utilities = df.query("Sector == 'Mining, Utilities'")

#turning the dataframes from aboove into numpy arrays to calculate the values more easily
construction_array = construction.to_numpy()

manufacturing_array = manufacturing.to_numpy()

mining_utilities_array = mining_utilities.to_numpy()

#calculating values for the new variable
industry_array = construction_array + manufacturing_array + mining_utilities_array

industry_array

The problem is that I don't know how to properly integrate this array into the database since I am lacking values for all the other columns.

I have also tried to do this using dictionaries, but it turned to be confusing, so I decided not to proceed with that option

kssmvv
  • 1
  • 1
  • Hi kssmvv! Welcome to StackOverflow – Mark Jul 11 '23 at 13:36
  • "5 columns: Economy, ADB Code, Year, Sector, Employment index, VA index" Maybe I'm miscounting, but I counted 6 columns in that list? :S – Mark Jul 11 '23 at 13:36
  • 1
    Yes, sorry. I have already fixed that – kssmvv Jul 11 '23 at 13:45
  • which values are you wanting to add together? I presume it is both the indexes? And I presume grouped by year? – Mark Jul 11 '23 at 13:47
  • Also, the screenshot you posted doesn't have the economy column – Mark Jul 11 '23 at 13:49
  • 1
    https://stackoverflow.com/help/minimal-reproducible-example – Mark Jul 11 '23 at 13:49
  • Yes, I want to add up both the indexes, grouped as in the excel screenshot. And yes, there is no 'economy' column in excel, I just wanted to quickly display the concept – kssmvv Jul 11 '23 at 13:54
  • 1
    Why don't you provide your example in a [reproducible format](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) (not an image!) together with the matching expected output? – mozway Jul 11 '23 at 13:59

1 Answers1

1

Here's something which appears (based on the screenshots) roughly what you are wanting to do:

import pandas as pd

df = pd.DataFrame({
    "Year" : [2016, 2016, 2016],
    "Sector" : ["Mining", "Agriculture", "Manufacturing"],
    "Employment Index" : [0.5, 0.6, 0.7],
    "VA Index" : [0.4, 0.5, 0.6]
})
emp_index = (df.pivot(index="Year", columns="Sector", values="Employment Index")
             .assign(Total = lambda x: x.sum(axis=1)))

va_index = (df.pivot(index="Year", columns="Sector", values="VA Index")
            .assign(Total = lambda x: x.sum(axis=1)))

emp_index
Sector  Agriculture  Manufacturing  Mining  Total
Year                                             
2016            0.6            0.7     0.5    1.8

va_index
Sector  Agriculture  Manufacturing  Mining  Total
Year                                             
2016            0.5            0.6     0.4    1.5

If it isn't, please post the intended output, along with a reproducible example.

Mark
  • 7,785
  • 2
  • 14
  • 34