0

I am analyzing two factor variables which have some missing values. How can I omit the missing values in table command:

> table(code3,code4)
       code4
code3     HIGH LOW
        134    9   1
   HIGH  22    7   0
   LOW   19    0   8
> 
>
> round(prop.table(table(code3,code4),2),2)
       code4
code3      HIGH  LOW
        0.77 0.56 0.11
   HIGH 0.13 0.44 0.00
   LOW  0.11 0.00 0.89
> 

I want table to show only "HIGH" and "LOW" value columns and rows, i.e. omit all missing values.

Also please tell me if these missing values will make any difference to chisq.test:

> 
> chisq.test(code3,code4)

        Pearson's Chi-squared test

data:  code3 and code4 
X-squared = 57.8434, df = 4, p-value = 8.231e-12

Warning message:
In chisq.test(code3, code4) :
  Chi-squared approximation may be incorrect
> 
> 

I suspect it is a simple issue but I could not find any easy answer on the internet.

"help(table)" command in R gives following information:

## NA counting:
     is.na(d) <- 3:4
     d. <- addNA(d)
     d.[1:7]
     table(d.) # ", exclude = NULL" is not needed
     ## i.e., if you want to count the NA's of 'd', use
     table(d, useNA="ifany")

How can I adapt it to my requirement? Thanks for your help.

rnso
  • 23,686
  • 25
  • 112
  • 234
  • Welcome on SO. Please provide a working example: http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – sgibb Apr 12 '14 at 14:39

1 Answers1

1

I suspect that your 'missing values' are blanks (""). If you code them as NA instead, you make life easier.

A small example (of what I guess is going on)

# sample data with some 'missing values'
x <- c("high", "", "low", "", "high", "")
x
table(x)
#   high  low 
# 3    2    1     

# replace "" with R:s 'official' missing values
x[x == ""] <- NA

table(x)
# x
# high  low 
#    2    1 

Perhaps relevant here as well is the na.strings argument in read.table.

Next time, please provide a minimal, self contained example. Check these links for general ideas, and how to do it in R: here, here, and here.

Community
  • 1
  • 1
Henrik
  • 65,555
  • 14
  • 143
  • 159
  • Thanks for your response. I tried 'x[x == ""] <- NA' works although the table still shows '0' values. Can I add na.strings="NA" to read.csv(filename)? I tried, but it does not seem to work. – rnso Apr 12 '14 at 16:18
  • 1
    In `na.strings` you specify the values that missing values are coded in your text file, which then will be converted to `NA`in the data frame. As you can see in the help text `?read.table`: "Blank fields are also considered to be missing values in logical, integer, numeric and complex fields.". If your variable is interpreted as a character by `read.table`, a blank will not be considered missing, but will appear as `""` in the data frame. Again, you need to provide a **minimal reproducible example** to get more specific help. – Henrik Apr 12 '14 at 21:03