First let me apologise for the long winded question. I've struggled to find an answer on Stackoverflow that addresses my specific issue. I am new to Pandas and Python programming so I would appreciate all the help I can get.
I have a dataframe:
ID Name Colour Power Year Money (millions)
0 1234567 Tony Stark Red Genius 2020 20000
1 9876543 Peter Parker Red Spider 2021 75
2 1415926 Miles Morales Green Spider 2021 55
3 7777777 Dante Brisco Blue hybrid 2020 3
4 4355681 Thor Odinson Blue Lightning 2020 655
5 1928374 Bruce Wayne Yellow Bat 2021 12000
6 5555555 Eddie Brock Black Symbiote 2021 755
7 8183822 Billie Butcher Yellow V 2021 34
8 6666654 Ian Wilm Red Lightning 2020 34
9 4241111 Harry Potter Green Wizard 2020 24
10 7765434 Blu Malk Red Wizard 2021 77
11 6464647 Yu Hant Black Wizard 2021 65
I want to create a new df that looks like this:
**Colour Total Year 2020 Year 2021**
Red 20186 20034 152
Green 79 24 55
Blue 658 658 -------
Yellow 12034 ------- 12034
Black 820 ------- 820
Where the "Colour" column becomes the new primary key/ID, the duplicates are removed and the values per year are summed up along with an overall total. I have managed to sum up the Total but I am struggling to write a function that will sum up rows by year and than assign the sum to the respective colour. I would eventually like to Create new columns based on calculations from the Yearly columns (percentages)
Here is what I have after creating the DF from an excel file :
#This line helps me calculate the total from the old df.
df['Total'] = df.groupby(['Colour'])['Money (millions)'].transform('sum')
#This line drops the duplicates from the line above. So now I have a total column that matches the #Colours
new_df = df.drop_duplicates(subset=['Colour'])
When I repeat the process for the Yearly column using the same technique it sums up the overall total for the whole year and assigns it to every colour.
I would eventually like to Create new columns based on calculations from the Yearly columns (percentages) e.g.
new_df['Success Rate'] = new_df['Total'].apply(lambda x: (x/100)*33)
I'd be grateful for any help provided :)