0

I have a csv file, TwitterCount with contents such as:

Tom   3
Alex  4
Sedgwick 1

and read the file into r. I'm trying to plot a histogram with the data i have but it keeps producing the error 'x' must be numeric. Here's the script i had so far..

userc = read.csv("TwitterCount.csv",header = FALSE)

and after reading into R, i try to head() it to see the format.

head(userc)

                V1
1 Tom            3
2 Alex           4
3 Sedgwick       1

but when i plot with hist(userc) it says 'x' must be numeric which i don't quite get.

Maxxx
  • 3,688
  • 6
  • 28
  • 55
  • 1
    `userc` is a dataframe, but the `hist` function requires a `vector`. Notice the `V1` at the top of the output of `head(userc)`. That is shorthand for Vector 1. In R, you can access each vector of a dataframe using the $ symbol. So try `hist(userc$V1)` – shuckle Oct 18 '17 at 02:28
  • @shuckle yeap the same error is persisting. 'x' must be numeric – Maxxx Oct 18 '17 at 02:34
  • Please paste into your question the output of `dput(userc[1:10,])` – eipi10 Oct 18 '17 at 02:35
  • `str(userc)` will tell you if `userc$V1` is numeric or another class. – Djork Oct 18 '17 at 02:38
  • @eipi10 the file is huge but after pasting the command, i notice a line at the bottom, class = "factor" The rest is just the username with it's count in quotations – Maxxx Oct 18 '17 at 02:42
  • @Djork 'data.frame': 8977904 obs. of 1 variable: $ V1: Factor w/ 8977904 levels – Maxxx Oct 18 '17 at 02:42
  • So, the problem is that `V1` is getting read in as a factor, rather than as numeric. There's probably at least one non-numeric values in the data. So, do `userc$V1 = as.numeric(as.character(userc$V1))`, then run the histogram code. – eipi10 Oct 18 '17 at 02:44
  • 2
    This tells you `userc$V1` is not numeric but a factor. To convert to numeric use `as.numeric(as.character(userc$V1))`. Notice that `as.numeric(userc$V1)` does not work, to understand why, see: https://www.stat.berkeley.edu/classes/s133/factors.html. – Djork Oct 18 '17 at 02:45
  • 1
    I think you are reading the names in as the same column as the counts. maybe try `sep = " "` inside of the `read.csv` – shuckle Oct 18 '17 at 02:45
  • @eipi10 it worked but when running the command it produced a message: Warning message: NAs introduced by coercion . Should i just leave it as it is and proceed with the hist code? – Maxxx Oct 18 '17 at 02:56
  • Any character values in`V1` will be converted to `NA` when you change the class from factor to numeric. However, looking at your sample data, it looks like the problem may be (as @shuckle said) that R thinks the names and the values are a single column rather than two separate columns. Does all of `V1` get converted to `NA` when you try to convert `V1` to numeric? Maybe the data aren't being read in the format you intended. – eipi10 Oct 18 '17 at 03:12
  • @eipi10 in the csv file, the name and the count are in one column. In this case, the first column is A in excel, and that A column contains both the name and the count. In this case, am i reading it correctly into R? – Maxxx Oct 18 '17 at 03:17
  • You'll need to split those into two columns if you want to be able to analyze the numeric data. We need a sample of your actual data to be able to help you further. Please post the output of `dput(userc[1:10, ])`. – eipi10 Oct 18 '17 at 03:19
  • @eipi10 an example of output after running the command is in the question. I'm starting to get that V1 is encompassing the name and the count which is why it's not working. – Maxxx Oct 18 '17 at 03:36
  • @eipi10 i have also attached a screenshot of the data in excel – Maxxx Oct 18 '17 at 03:51
  • Please paste the output of `dput(userc[1:10, ])` into your question. Where is the screenshot of the data in excel, I am curious to see the format and why it's being imported this way. – Djork Oct 18 '17 at 23:28

0 Answers0