1

I encounter the following error message:

Error in nchar(Tony.raw$neighborhood_overview) : 
  'nchar()' requires a character vector

I don't know why nchar can't read into neighborhood_overview column. I have an assignment with provided CSV file to data about denvers neighborhood social statistics from questionnaires. I need to count the character length of certain columns of data and then chart them up to represent certain perspectives available in the data. I'm gonna try the same code on different data columns and see what I get.

Link to the .csv data:

https://drive.google.com/open?id=1mGsy52nZtRNpAFEWiWaJHB2nsm2hnvsU

#Load up the .CSV data and explore in RStudio
Tony.raw <- read.csv("denver_listings.csv", stringsAsFactors = FALSE)
View(Tony.raw)

# Clean up the data frame and view our handiwork.
Tony.raw <- Tony.raw[, c("description", "neighborhood_overview")]
View(Tony.raw)

# Check data to see if there are missing values.
length(which(!complete.cases(Tony.raw)))

#Convert our class label into a factor.
Tony.raw$neighborhood_overview <- 
as.factor(which(complete.cases(Tony.raw$neighborhood_overview)))

# The first step , as always, is to expore the data.
#First, let's take a look at distribution of the class labels (i.e., ham 
vs. spam),
prop.table(table(Tony.raw$neighborhood_overview))

#Next up , let's get a feel for the distribution of text lengths of the 
SMS
# messages by adding a new dearture for the length of each message.
Tony.raw$TextLength <- nchar(Tony.raw$neighborhood_overview)
summary(Tony.raw$TextLength)

#Visualize distribution with ggplot2, adding segmentation for ham/spam
library(ggplot2)

ggplot(Tony.raw, aes(x=TextLength, fill = neighborhood_overview)) +
  theme_bw() +
  geom_histogram(binwidth = 5) +
  labs(y = "Text Count", x = "Length of Text",
       title = "Distribution of Text Lengths with class Labels")

Setting Tony.raw$TextLength as the nchar of Tony.raw$neighborhood_overview, i should be able to count the character numbers, and thus plot that in to chart with ggplot2. But it says nchar requires a character vector. Is it because the description data are not characters or the column label is not a character? i have no idea.

Marco
  • 2,368
  • 6
  • 22
  • 48
pysolver33
  • 307
  • 1
  • 5
  • 13
  • To get help on this site, you should share a small portion of your data using `dput(Tony.raw)`. It's a lot easier for people to help you if they can just copy and paste in rather than download your whole data set. Check out [this link](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) for tips on asking a question here – astrofunkswag Aug 08 '19 at 17:26

1 Answers1

0

In the fourth block of your code you've turned Tony.raw$neighborhood_overview into a factor. You need

nchar(labels(Tony.raw$neighborhood_overview)[Tony.raw$neighborhood_overview])

instead of nchar(Tony.raw$neighborhood_overview) to get the nchar of the labels of the factor.

When you write nchar(Tony.raw$neighborhood_overview) it calls nchar on the levels of the factor, which are integer values from 1 to the number of levels and throws an error as nchar gets numbers instead of a string.

Grada Gukovic
  • 1,228
  • 7
  • 13
  • I had a only-numerical matrix and I have the same issue. Moreover, following the example in the package (omega(Thursntone) the same error message appears – Antonio Canepa Aug 06 '20 at 15:04