23

I have a file called rRna_RDP_taxonomy_phylum with the following data :

364  "Firmicutes"            39.31
244  "Proteobacteria"        26.35
218  "Actinobacteria"        23.54
65   "Bacteroidetes"         7.02
22   "Fusobacteria"          2.38
6    "Thermotogae"           0.65
3     unclassified_Bacteria  0.32
2    "Spirochaetes"          0.22
1    "Tenericutes"           0.11
1     Cyanobacteria          0.11

And I'm using this code for creating a pie chart in R:

if(file.exists("rRna_RDP_taxonomy_phylum")){
    family <- read.table ("rRna_RDP_taxonomy_phylum", sep="\t")
    piedat <- rbind(family[1:7, ],
                as.data.frame(t(c(sum(family[8:nrow(family),1]),
                                "Others",
                                sum(family[8:nrow(family),3])))))
    png(file="../graph/RDP_phylum_low.png", width=600, height=550, res=75)
    pie(as.numeric(piedat$V3), labels=piedat$V3, clockwise=TRUE, col=graph_col, main="More representative Phyliums")
    legend("topright", legend=piedat$V2, cex=0.8, fill=graph_col)
    dev.off()
    png(file="../graph/RDP_phylm_high.png", width=1300, height=850, res=75)
    pie(as.numeric(piedat$V3), labels=piedat$V3, clockwise=TRUE, col=graph_col, main="More representative Phyliums")
    legend("topright", legend=piedat$V2, cex=0.8, fill=graph_col)
    dev.off()
}

I've been using this code for different datafiles and it works fine, but with the file presented adobe it crash returning the following message:

Error in Summary.factor(c(6L, 2L, 1L), na.rm = FALSE) : 
  sum not meaningful for factors
Calls: rbind -> as.data.frame -> t -> Summary.factor
Execution halted

I need to understand why it crash with this file and if there's any way to prevent this kind of errors.

Thanks!

Matthew Lundberg
  • 42,009
  • 6
  • 90
  • 112
user2245731
  • 449
  • 3
  • 7
  • 16
  • `sum(factor(1))` reproduces the error. But Why do you have factors in this data.frame and not in others? How do you read your data? – agstudy Aug 04 '13 at 16:56
  • @smci Please do not use the [factor] tag for factors in R. – Matthew Lundberg May 20 '14 at 02:15
  • @MatthewLundberg: gotcha, didn't know. I must go retag a bunch of stuff. Since Factor language is less popular than R factor, I think it should have the tag [tag:factor-language]. I will raise this on Meta. – smci May 20 '14 at 07:54
  • 1
    @smci It's on both metas. The new tag is [factor-lang]. All questions on the language have been retagged. Feel free to properly tag questions on R factors. – Matthew Lundberg May 20 '14 at 13:27
  • Good work @MatthewLundberg. Will get around to it. – smci May 20 '14 at 18:43

1 Answers1

45

The error comes when you try to call sum(x) and x is a factor.

What that means is that one of your columns, though they look like numbers are actually factors (what you are seeing is the text representation)

simple fix, convert to numeric. However, it needs an intermeidate step of converting to character first. Use the following:

family[, 1] <- as.numeric(as.character( family[, 1] ))
family[, 3] <- as.numeric(as.character( family[, 3] ))

For a detailed explanation of why the intermediate as.character step is needed, take a look at this question: How to convert a factor to integer\numeric without loss of information?

divibisan
  • 11,659
  • 11
  • 40
  • 58
Ricardo Saporta
  • 54,400
  • 17
  • 144
  • 178
  • Did you run their code? It works for me. They already call `as.numeric(piedat$V3)` as you suggest. – Joshua Ulrich Aug 04 '13 at 16:57
  • @JoshuaUlrich How did you get the data? Look at the 3rd (4th?) line of code in the OP `as.data.frame(t(c(sum(family[8:nrow(family),1]),` – Ricardo Saporta Aug 04 '13 at 17:03
  • I used `read.table(text="...")`, since they don't provide a file. I'm aware of that line. It makes all the columns either character or factor. My point is that they already call `as.numeric` on the character columns. – Joshua Ulrich Aug 04 '13 at 17:19
  • @JoshuaUlrich, my point in asking where you got the data was simply to draw attention to the fact that the copy+paste of the code working was likely due to differences in classes of the columns between what we would input and what the asker probably has in their environment. (yes of course, another great reason to use reproducible examples in the OP). Yes they have a call to `as.numeric` but clearly not early enough :) – Ricardo Saporta Aug 04 '13 at 20:21
  • 1
    Hi - why does the data need to be converted to characters first? – Trung Tran Nov 21 '14 at 23:13
  • 1
    @user1547174, the reason has to do with how `R` stores `factors` -- as numbers. So if you take the sum of the a group of factors, you will get an integer result, but not necessarily the one you'd expect. Have a look (copy and paste this) `x <- as.factor(c(100, 10)); sum(as.numeric(x)); sum(as.numeric(as.character(x)));` – Ricardo Saporta Nov 25 '14 at 01:07
  • thank you, I was banging my head on this for days trying to aggregate RMSE after cv passes – thistleknot Dec 15 '18 at 11:34