1

This table from Wikipedia shows the 10 biggest box office hits. I can't seem to get the total of the 'worldwide_gross' column. Can someone help? Thank you.

import pandas as pd
boxoffice_df=pd.read_html('https://en.wikipedia.org/wiki/List_of_highest-grossing_films')
films = boxoffice_df[1]

films.rename(columns = {'Worldwide gross(2020 $)':'worldwide_gross'}, inplace = True)

films.worldwide_gross.sum(axis=0)

enter image description here

This is the output I get when I try calculating the total global earnings: enter image description here

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245

4 Answers4

1
films.astype({"worldwide_gross": int})    
Total =films['worldwide_gross'].sum()
1
Total =films['worldwide_gross'].astype('Int32').sum()

or convert data-types 1st.

films = films.convert_dtypes()
Total = films['worldwide_gross'].sum()
Nk03
  • 14,699
  • 2
  • 8
  • 22
0

Here's one way you can do it.

This code will convert the values in the worldwide_gross to integers and then sum the column to get the total gross.

import pandas as pd

def get_gross(gross_text):
  pos = gross_text.index('$')
  return int(gross_text[pos+1:].replace(',', ''))
  
boxoffice_df=pd.read_html('https://en.wikipedia.org/wiki/List_of_highest-grossing_films')
films = boxoffice_df[1]

films.rename(columns = {'Worldwide gross(2020 $)':'worldwide_gross'}, inplace = True)

films['gross_numeric'] = films['worldwide_gross'].apply(lambda x: get_gross(x))

total_gross = films['gross_numeric'].sum()

print(f'Total gross: ${total_gross}')
norie
  • 9,609
  • 2
  • 11
  • 18
0

You will have to keep only digits in column worldwide_gross using regex and then convert the column to float using series.astype('float')

Add:

films.worldwide_gross = films.worldwide_gross.str.replace('\D',"",regex = True).astype(float)

Complete Code:

import pandas as pd
boxoffice_df=pd.read_html('https://en.wikipedia.org/wiki/List_of_highest-grossing_films')
films = boxoffice_df[1]

films.rename(columns = {'Worldwide gross(2020 $)':'worldwide_gross'}, inplace = True)
films.worldwide_gross = films.worldwide_gross.str.replace('\D',"",regex = True).astype(float)
films.worldwide_gross.sum(axis=0)
Hamza usman ghani
  • 2,264
  • 5
  • 19