I am trying to scrape user reviews from a web site. Some of the reviews do not have body text so I am left with vectors of different lengths and getting the "arguments imply differing number of rows: 20, 19" error (20 is correct) when trying to combine the scraped datetime, rating, and review results into a data frame.
I have looked at the solution here which uses !nzchar to perform a replacement if the length of an html node is zero. This would seem to be a good solution for me but I can't get the code to insert a value into the vector to make the length correct. My code to scrape the node that contains an empty value is:
library(rvest)
library(tidyverse)
library(stringr)
url <- "http://www.trustpilot.com/review/www.amazon.com?page=2"
working_page <- read_html(url)
working_reviews <- working_page %>%
html_nodes('.typography_body__9UBeQ.typography_color-black__5LYEn') %>%
html_text(trim=TRUE) %>%
replace(!nzchar(.), NA) %>%
str_trim() %>%
unlist()
length(working_reviews)
[1] 19
This returns a vector of 19 values; my expected output is a vector of 20 values, with 'NA' filling those values for which there isn't a review body. On this particular page, the 17th review contains no body text.
Desired result:
working_reviews[1]
[1] "I placed an order w/Amazon and selected the 18 payment plan. Amazon charged the entire amount to my card. Called them and got no where. I was told it was the banks fault and I had to take it up with them.Buyer be ware!!!"
working_reviews[17]
[17] "NA"
I have also tried using the following line to "force" insert a string into the empty review:
working_reviews <- working_page %>%
html_nodes('.typography_body__9UBeQ.typography_color-black__5LYEn') %>%
html_text(trim=TRUE) %>%
replace(!nzchar(.), "No review") %>%
str_trim() %>%
unlist()
This produces the same result with a length of 19 and does not include an element containing "No review".
I also tried inverting the nzchar code as a test, removing the '!' and got back a 19-element vector with "NA" for every element.