I'm trying to convert data scraped from book depository, bests selling books into numeric data so that I can graph it.
My code currently is:
selector <- ".rrp"
library(rvest)
url <- "https://www.bookdepository.com/bestsellers"
doc <- read_html(url)
prices <- html_nodes(doc, selector)
html_text(prices)
library(readr)
Spiral <- read_csv("C:/Users/Ellis/Desktop/INFO204/Spiral.csv")
View(Spiral)
My attempting to clean the data:
text <- gsub('[$NZ]', '', Spiral) # removes NZ$ from data
But the data now looks like this:
[1] "c(\"16.53\", \"55.15\", \"36.39\", \"10.80\", \"27.57\", \"34.94\",
\"27.57\", \"22.06\", \"22.00\", \"16.20\", \"22.06\", \"22.06\",
\"19.84\", \"19.81\", \"27.63\", \"22.06\", \"10.80\", \"27.57\",
\"22.06\", \"22.94\", \"16.53\", \"25.36\", \"27.57\", \"11.01\",
\"14.40\", \"15.39\")"
and when I try run:
as.numeric(text)
I get:
Warning message: NAs introduced by coercion
How do I clean the data up in such a way that NZ$
is removed from the price and I'm able to plot the 'cleaned data'