2

I have imported a html table into R:

require(XML)
u='http://www.ininternet.org/calorie.htm'
tables = readHTMLTable(u)
my.table=tables[[9]]
View(my.table)

But now I have problems when I want to analyze the data and apply any function, for example

> mean(PROTEINE)
Warning message:
In mean.default(PROTEINE) :
  argument is not numeric or logical: returning NA

Please tell me how to import a table so that I could analyze the data properly.

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
  • Look at your object with `str(my.table)` and it will show you the important details of your table quickly and easily. – SlowLearner Feb 26 '14 at 09:47

2 Answers2

3

You're trying to calculate the mean of a "factor" type variable:

> lapply(my.table, class)
$ALIMENTO
[1] "factor"

$PROTEINE
[1] "factor"

$GRASSI
[1] "factor"

$CARBOIDRATI
[1] "factor"

$CALORIE
[1] "factor"

$COLESTEROLO
[1] "factor"

You'll need to converting it to numeric first. Consider:

tmp <- as.numeric(as.character(my.table$PROTEINE))
mean(tmp)
## [1] 10.81395

See this question and answer for an explanation.

Community
  • 1
  • 1
Thomas
  • 43,637
  • 12
  • 109
  • 140
2

They are all factors, change them to character and numeric like this:

my.table[,1] <- sapply(my.table[,1], as.character)
my.table[,2:6] <- sapply(my.table[,2:6], function(x) as.numeric(as.character(x))

Or in the original read in, specify stringsAsFactors=F. But this isn't perfect because it makes everything a character, so you still need to convert to numeric

tables = readHTMLTable(u,stringsAsFactors=F)
my.table[,2:6] <- sapply(my.table[,2:6], as.numeric)
JeremyS
  • 3,497
  • 1
  • 17
  • 19