-2

I have a pandas dataframe where one column has values in kiloTon as abbreviated 'kt'. Now when I perform groupby on Country column and year column and call aggregation function sum on Value column, it's not actually doing sum of values in value column.

The dataset

After performing above action, following is coming:

After groupby n aggregation

However the expected output should be:

enter image description here

Also the 'Value' column is of type object.

Any help will be useful.

wjandrea
  • 28,235
  • 9
  • 60
  • 81
  • 1
    Welcome to Stack Overflow! Please take the [tour]. [Don't post pictures of text](https://meta.stackoverflow.com/q/285551/4518341). Instead, copy the text itself, [edit] it into your post, and use the formatting tools like [code formatting](/editing-help#code). BTW if you want more tips, check out [How to ask a good question](/help/how-to-ask). – wjandrea Jul 15 '23 at 17:02
  • 1
    If you're going to do arithmetic, then you need to remove the units before you create the column. pandas tries to figure out your data types, and when it sees `"1234 kt"`, that's a string. So, make the data `1234` and put `kt` in the header. – Tim Roberts Jul 15 '23 at 17:04
  • Also, please make a [mre], meaning minimal example input data and desired output. You could probably just use one group from the groupby, or part of a group if it's large. For specifics see [How to make good reproducible pandas examples](/q/20109391/4518341). – wjandrea Jul 15 '23 at 17:05
  • Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. – Community Jul 15 '23 at 22:14
  • @wjandrea thank you for your feedback in regards to my post. Shall make improvements in future. – Surbhi Jain Jul 16 '23 at 14:26
  • @TimRoberts Ahh! Alright. I wanted to know if there is any way to add values along with units (ofcourse same units) . So conclusion is we cant . What you mean by putting kt in header ? – Surbhi Jain Jul 16 '23 at 14:27
  • If the whole column is in kilotons, then make the header `'Value (kt)'` so there is no confusion. If everything in the whole chart is kilotons, then maybe you can just add a note when you publish it. – Tim Roberts Jul 16 '23 at 23:34

1 Answers1

0

If you are using values with mixed numbers and letters then they will be strings of Pandas dtype object. You need to split of the numerical part, convert to an integer, put into a new column and then use groupby with sum or whatever. For example:

import pandas as pd

df = pd.DataFrame({'Country': ['Algeria', 'Algeria','Algeria','Angola', 'Angola'],
                   'Item': ['Wheat and products', 'Wheat and products','Wheat and products','Wheat and products','Wheat and products'],
                   'Year': [2004, 2004,2005,2004,2004],
                   'Value':['2731 kt', '2415 kt','2688 kt','2000 kt','1111 kt']
                   })

df['ValNum'] = df['Value'].str.extract(r"(\d+)").astype('int')

df2 = df.groupby(['Country', 'Year'])['ValNum'].sum()

print(df2)

gives:

Country  Year
Algeria  2004    5146
         2005    2688
Angola   2004    3111
user19077881
  • 3,643
  • 2
  • 3
  • 14