0

In the following example, I want to extract NA as a level and display it in the table just as other levels. The levels() function doesn't work with NA value. Is there any other way to deal with this problem?

n=1000
comorbid<-sample(c(rep("diabetes",2),
  rep("hypertension",5),
  "cirrhosis","stroke","heartfailure",
  "renalfailure",rep("COPD",3)),
  n,
  replace=T)
comorbid[sample(1:n,50)]<-NA
mort<-sample(c(rep("alive",4),
"dead"),n,replace=T)
table.cat<-data.frame(matrix(rep(999,7),nrow=1))
table<-table(comorbid,useNA="always")
per<-prop.table(table)
table.sub<-table(comorbid,mort,useNA="always")
per.sub<-prop.table(table.sub,2)
p<-tryCatch({#using fisher's test when scarce data
      chisq.test(table.sub)$p.value
   }, warning = function(w) {
      fisher.test(table.sub,
      workspace = 10e7)$p.value
   })
frame<-data.frame(No.tot=as.data.frame(table)[,"Freq"],
     per.tot=as.data.frame(per)[,"Freq"],
     No.1=as.data.frame.matrix(table.sub)[,"alive"],
     per.1=as.data.frame.matrix(per.sub)[,"alive"],
     No.2=as.data.frame.matrix(table.sub)[,"dead"],
     per.2=as.data.frame.matrix(per.sub)[,"dead"],
     p=p)
rownames(frame)<-paste("comorbid",levels(comorbid),sep="_")
Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Z. Zhang
  • 637
  • 4
  • 16

1 Answers1

0

levels() works just fine with NA values. What levels() requires however is a factor (or anything with a levels attribute). As per your code, comorbid is a character vector:

> class(comorbid)
[1] "character"

If you coerce comorbid to a factor and change the default so that NAs are not excluded from the factor levels, you get the desired behaviour:

fcomorbid <- factor(comorbid, exclude = NULL)

levels(fcomorbid)
paste("comorbid", levels(fcomorbid), sep = "_")

> levels(fcomorbid)
[1] "cirrhosis"    "COPD"         "diabetes"     "heartfailure" "hypertension"
[6] "renalfailure" "stroke"       NA            
> paste("comorbid", levels(fcomorbid), sep = "_")
[1] "comorbid_cirrhosis"    "comorbid_COPD"         "comorbid_diabetes"    
[4] "comorbid_heartfailure" "comorbid_hypertension" "comorbid_renalfailure"
[7] "comorbid_stroke"       "comorbid_NA"

To complete your example then

rownames(frame) <- paste("comorbid", levels(fcomorbid), sep = "_")

and we have

> frame
                      No.tot per.tot No.1      per.1 No.2      per.2         p
comorbid_cirrhosis        69   0.069   57 0.07011070   12 0.06417112 0.3108409
comorbid_COPD            209   0.209  172 0.21156212   37 0.19786096 0.3108409
comorbid_diabetes        128   0.128  101 0.12423124   27 0.14438503 0.3108409
comorbid_heartfailure     57   0.057   45 0.05535055   12 0.06417112 0.3108409
comorbid_hypertension    334   0.334  267 0.32841328   67 0.35828877 0.3108409
comorbid_renalfailure     78   0.078   61 0.07503075   17 0.09090909 0.3108409
comorbid_stroke           75   0.075   63 0.07749077   12 0.06417112 0.3108409
comorbid_NA               50   0.050   47 0.05781058    3 0.01604278 0.3108409
Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
  • I wouldn't say it works "just fine" :). `levels(my_vec) <- c(NA,"a")` has a weird behaior and NA levels will be dropped by functions such as `rbind`. see this question about it: https://stackoverflow.com/questions/45216532/how-can-i-keep-na-when-i-change-levels – moodymudskipper Jul 28 '17 at 22:21
  • my conclusion so far is that NA levels should be used VERY locally when you really know what you're doing, else consider replacing them with a regular level, such as "NA" or "unknown" – moodymudskipper Jul 28 '17 at 22:24
  • @Moody_Mudskipper given that `levels()` stated purpose is to "provide[s] access to the levels attribute of a variable." I'd say this works *just fine*. You are talking about the replacement function variant `levels<-()` and the behaviour there is documented in `?levels`. – Gavin Simpson Jul 28 '17 at 22:33
  • You are perfectly right Gavin that your solution works fine. But I thought it'd be worthy of attention to mention many things can go wrong when using NA levels. – moodymudskipper Jul 28 '17 at 22:35
  • @Moody_Mudskipper I wasn't concerned about how this applied to my solution, it's more the suggestion that `levels()` doesn't work with `NA`s (as per the OP's Question) or doesn't "work fine " with `NA`s. `levels()` and even `levels<-` do work fine, it is what happens in other functions to factors with NAs and that as a stated level. By the same token that you wanted to supply a related warning, my reply was merely a response that it does work fine. – Gavin Simpson Jul 28 '17 at 22:40
  • `x <- factor(c("a","b",NA),exclude=NULL);levels(x) <- c("a","b",NA)` removes `NA` from the levels, that's something I find counter-intuitive. `levels` to provide access to level attributes does work as I expect it to (fine :) ). – moodymudskipper Jul 28 '17 at 23:01
  • @Moody_Mudskipper for the love of all that is holy, what does `levels<-()` have to do with this Q&A? No one mentioned this function nor this functionality in the Q or the Answer. I understand this issue perfectly but it is at most tangential to the question posed by the OP. If you want to make some statement about potential gotchas with `NA` levels do so to the OP. – Gavin Simpson Jul 29 '17 at 16:18