0

I am quite new into WebScraping - I still have to learn a lot about HTML... However, yesterday I tried to get all actors of each movie from: https://www.imdb.com/search/title/?title_type=feature&year=2020-01-01,2020-12-31&start=1001&ref_=adv_nxt as own element of a vector with all actors from all movies. But I did not find any examples online, so I wonder if someone could help me out?

This command gives me all actors listed on the website, each actor is an own element of the vector.

other_actors_html <- html_elements(text,css='a[href*="adv_li_st"]')

# or this command works fine too:
other_actors_html <- html_elements(text,css='.lister-item-content')

other_actors_html_text <- html_text(other_actors_html)

But what I want is something different - I want to have movie-wise elements. Something like this:

test_vec <-  c("Christian De Sica,Massimo Boldi,Lucia Mascino,Milena Vukotic", NA, "Abhiram,Achila,Ajay,Vinod Anantoju") 

So I know which actors belong to which movie.

Any ideas? Many thanks, Nadine

Nadiine El Nino
  • 339
  • 1
  • 6

1 Answers1

0

I answered my question myself. The thing with pasting all actors from one movie as one vector element was easy (solved by paste with collapse) and the issue regarding missing values (no information e.g regarding actors etc.) I solved by input from another stackover-flow thread: Click here

Here's my code:

df1 <- text %>% 
  html_nodes('.lister-item-content') %>%    # select enclosing nodes
  # iterate over each, pulling out desired parts and coerce to data.frame
  map_df(~list(title.nr = html_nodes(.x, '.text-primary') %>% 
                 html_text() %>% 
                 {if(length(.) == 0) NA else .},  
               title = html_nodes(.x, 'h3 > a') %>% 
                 html_text() %>% 
                 {if(length(.) == 0) NA else .}, # replace length-0 elements with NA
               runtime = html_nodes(.x, '.runtime') %>% 
                 html_text() %>% 
                 {if(length(.) == 0) NA else .},
               genre = html_nodes(.x, '.genre') %>% 
                 html_text() %>% 
                 {if(length(.) == 0) NA else .},
               plot = html_nodes(.x, '.ratings-bar+ .text-muted') %>% 
                 html_text() %>% 
                 {if(length(.) == 0) NA else .},
               rating = html_nodes(.x, '.ratings-imdb-rating > strong') %>% 
                 html_text() %>% 
                 {if(length(.) == 0) NA else .},
               score = html_nodes(.x, '.metascore') %>% 
                 html_text() %>% 
                 {if(length(.) == 0) NA else .},
               director = html_nodes(.x, 'a[href*="adv_li_dr"]') %>% 
                 html_text() %>% 
                 {if(length(.) == 0) NA else .},
               actors=html_nodes(.x, 'a[href*="adv_li_st"]') %>% 
                  html_text() %>% 
                 {if(length(.) == 0) NA else paste(., collapse = ",")},
               votes=html_nodes(.x, 'p.sort-num_votes-visible') %>% 
                 html_text() %>% 
                 {if(length(.) == 0) NA else .}
               ))
Nadiine El Nino
  • 339
  • 1
  • 6