I encounter the following error message:
Error in nchar(Tony.raw$neighborhood_overview) :
'nchar()' requires a character vector
I don't know why nchar
can't read into neighborhood_overview
column. I have an assignment with provided CSV file to data about denvers neighborhood social statistics from questionnaires. I need to count the character length of certain columns of data and then chart them up to represent certain perspectives available in the data. I'm gonna try the same code on different data columns and see what I get.
Link to the .csv data:
https://drive.google.com/open?id=1mGsy52nZtRNpAFEWiWaJHB2nsm2hnvsU
#Load up the .CSV data and explore in RStudio
Tony.raw <- read.csv("denver_listings.csv", stringsAsFactors = FALSE)
View(Tony.raw)
# Clean up the data frame and view our handiwork.
Tony.raw <- Tony.raw[, c("description", "neighborhood_overview")]
View(Tony.raw)
# Check data to see if there are missing values.
length(which(!complete.cases(Tony.raw)))
#Convert our class label into a factor.
Tony.raw$neighborhood_overview <-
as.factor(which(complete.cases(Tony.raw$neighborhood_overview)))
# The first step , as always, is to expore the data.
#First, let's take a look at distribution of the class labels (i.e., ham
vs. spam),
prop.table(table(Tony.raw$neighborhood_overview))
#Next up , let's get a feel for the distribution of text lengths of the
SMS
# messages by adding a new dearture for the length of each message.
Tony.raw$TextLength <- nchar(Tony.raw$neighborhood_overview)
summary(Tony.raw$TextLength)
#Visualize distribution with ggplot2, adding segmentation for ham/spam
library(ggplot2)
ggplot(Tony.raw, aes(x=TextLength, fill = neighborhood_overview)) +
theme_bw() +
geom_histogram(binwidth = 5) +
labs(y = "Text Count", x = "Length of Text",
title = "Distribution of Text Lengths with class Labels")
Setting Tony.raw$TextLength as the nchar of Tony.raw$neighborhood_overview, i should be able to count the character numbers, and thus plot that in to chart with ggplot2. But it says nchar requires a character vector. Is it because the description data are not characters or the column label is not a character? i have no idea.