28

I have a data file with this format:

Weight    Industry Type  
251,787   Kellogg  h  
253,9601  Kellogg  a  
256,0758  Kellogg  h  
....

I read the data and try to draw an histogram with this commands:

 ce <- read.table("file.txt", header = TRUE)

 we = ce[,1]
 in = ce[,2]
 ty = ce[,3]

hist(we)

But I get this error:

Error en hist.default(we) : 'x' must be numeric.

What do I need to do in order to draw histograms for my three variables ?

zx8754
  • 52,746
  • 12
  • 114
  • 209
José Joel.
  • 2,040
  • 6
  • 28
  • 46
  • Related post: [How to read in numbers with a comma as decimal separator?](https://stackoverflow.com/questions/6123378) – zx8754 Feb 21 '19 at 09:03

3 Answers3

24

Because of the thousand separator, the data will have been read as 'non-numeric'. So you need to convert it:

 we <- gsub(",", "", we)   # remove comma
 we <- as.numeric(we)      # turn into numbers

and now you can do

 hist(we)

and other numeric operations.

Dirk Eddelbuettel
  • 360,940
  • 56
  • 644
  • 725
  • 4
    A correction: it's not the thousand separator, it's the decimal point that in some countries is a comma. So it needs to be replaced by a point, not removed. – momobo Feb 28 '10 at 10:50
  • 2
    There is an argument `sep=""` to `read.table`, `read.csv`, ... that allows you to set this at the R level. – Dirk Eddelbuettel Mar 02 '10 at 15:48
5

Note that you could as well plot directly from ce (after the comma removing) using the column name :

hist(ce$Weight)

(As opposed to using hist(ce[1]), which would lead to the same "must be numeric" error.)

This also works for a database query result.

Skippy le Grand Gourou
  • 6,976
  • 4
  • 60
  • 76
3

Use the dec argument to set "," as the decimal point by adding:

 ce <- read.table("file.txt", header = TRUE, dec = ",")
zx8754
  • 52,746
  • 12
  • 114
  • 209