0

I have a small question regarding binary operations in a dataframe. Here I have a dataframe and I want to create a new column PerWeek which is the result when taking Gross divided by Weeks, and I am wondering how can I do it since Gross elements are not numeric.

boxoffice = function(){
  url = "https://www.imdb.com/chart/boxoffice"
  read_table = read_html("https://www.imdb.com/chart/boxoffice")
  movie_table = html_table(html_nodes(read_table, "table")[[1]])
  Name = movie_table[2]
  Gross = movie_table[4]
  Weeks = movie_table[5]
  BoxOffice = 
  for (i in 1:10){
    PerWeek = movie_table[4][i] %/% movie_table[5][i]
  }
  df = data.frame(Name,BoxOffice,PerWeek)
  return(df)
}

enter image description here

warmsoda
  • 57
  • 1
  • 9

1 Answers1

0

If you have Gross value always in millions, you can get the numbers from it and multiply by 1e6 to get amount in millions and then divide by Weeks.

library(rvest)
library(dplyr)

url = "https://www.imdb.com/chart/boxoffice"
read_table = read_html("https://www.imdb.com/chart/boxoffice")
movie_table = html_table(html_nodes(read_table, "table")[[1]])
movie_table <- movie_table[-c(1, ncol(movie_table))]
movie_table %>% mutate(per_week_calc = readr::parse_number(Gross) * 1e6/Weeks)


#                  Title Weekend   Gross Weeks per_week_calc
#1                Onward  $10.5M  $60.3M     2      30150000
#2       I Still Believe   $9.5M   $9.5M     1       9500000
#3             Bloodshot   $9.3M  $10.5M     1      10500000
#4     The Invisible Man   $6.0M  $64.4M     3      21466667
#5              The Hunt   $5.3M   $5.8M     1       5800000
#6    Sonic the Hedgehog   $2.6M $145.8M     5      29160000
#7          The Way Back   $2.4M  $13.4M     2       6700000
#8  The Call of the Wild   $2.2M  $62.1M     4      15525000
#9                 Emma.   $1.4M  $10.0M     4       2500000
#10    Bad Boys for Life   $1.1M $204.3M     9      22700000

If you have data in billions or thousands you can refer

Changing Million/Billion abbreviations into actual numbers? ie. 5.12M -> 5,120,000 and Convert from K to thousand (1000) in R

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Is there an option to add the `per_week_calc` to the dataframe besides `mutate()`, since my I want my dataframe to consist of `Title`, `Gross` and `per_week_calc` – warmsoda Mar 16 '20 at 03:55
  • @warmsoda Sorry, I didn't get you. You can use `select` to include only specific columns in final dataframe. `movie_table %>% mutate(per_week_calc = readr::parse_number(Gross) * 1e6/Weeks) %>% select(Title, Gross, per_week_calc)` if you want that. – Ronak Shah Mar 16 '20 at 04:04
  • Thank you for your time. I used `transmute()` instead of `mutate()` and got what I wanted. – warmsoda Mar 16 '20 at 04:18