I want to get some airline reviews from airlinequality.com page, where information about different flight aspects are available. When writing flight review, not all fields are mandatory. This creates structure, when different reviews have different number of elements, which my current code can't handle.
For example, I want to get reviews from this page: http://www.airlinequality.com/airline-reviews/austrian-airlines/page/1/
There are 10 reviews for Seat Comfort, but Inflight Entertainment is available only inf 8. In the end, this creates two vectors of different length, which can't be merged.
My code:
review_html_temp = read_html("http://www.airlinequality.com/airline-reviews/austrian-airlines/page/1/)
review_seat_comfort = review_html_temp %>%
html_nodes(xpath = './/table[@class = "review-ratings"]//td[@class = "review-rating-header seat_comfort"]/following-sibling::td/span[@class = "star fill"][last()]') %>%
html_text() %>%
str_replace_all(pattern = "[\r\n\t]" , "")
review_entertainment = review_html_temp %>%
html_nodes(xpath = './/table[@class = "review-ratings"]//td[@class = "review-rating-header inflight_entertainment"]/following-sibling::td//span[@class = "star fill"][last()]') %>%
html_text() %>%
str_replace_all(pattern = "[\r\n\t]" , "")
Is there way, how I can fill entertainment value with " " or NA, when node is not present for all 10 reviews? Final results would look like:
seat_comfort: "4" "5" "3" "3" "1" "4" "4" "3" "3" "3"
entertainment_system: "5" "1" NA "1" "1" "3" NA "3" "5" "1"