0

I have a data with a column which mentions the gender of a person. Unfortunately there are few misplaced columns/erroneous values.

summary(data$gender)

gives something like

boy : 19232
girl : 14565
Maths : 3
Science : 4
... some 20 garbage values : 1

I wrote a code to replace values other than boy, girl to error. Now summary(data$gender) gives something like

boy : 19232
error : 156
girl : 14565
Maths : 0
Science : 0
... other garbage values : 0

Is there any way I can prevent printing values for which count is 0?

Why I need this - There are more than 100 columns. I am using a new flag column which is set to 1 when it encounters an "error" across any cell value, and in the end, I delete the records with flag=1. I need to view a short summary of the entire data., something like

boy : 19232
error : 156
girl : 14565

Thanks for any help in advance!

rbtj
  • 331
  • 2
  • 6
  • 1
    likely you have a factor there and the levels are still present. remove the levels and try again. sharing some data would have been nice, see http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610 – mts Aug 12 '15 at 18:08
  • No. I have converted them to "error". – rbtj Aug 12 '15 at 18:11

2 Answers2

2

The function you're looking for is probably droplevels, to drop out unused factor levels.

In your case

summary(droplevels(data)$gender)
user295691
  • 7,108
  • 1
  • 26
  • 35
2

Here is an example:

data = factor(c("girl","boy","girl","boy","math","girl","girl"), levels = c("girl", "boy", "math"))
> summary(data)
girl  boy math 
   4    2    1 
data2 = factor(c("girl","boy","girl","boy","math","girl","girl"), levels = c("girl", "boy", "math", "garbage"))
> summary(data2)
   girl     boy    math garbage 
      4       2       1       0 
> summary(droplevels(data2))
girl  boy math 
   4    2    1 

In data everything is fine. But in data2 there is another level that is not used and it enforces the 0 to be shown.

As @user295691 has pointed out first droplevels (see ?droplevels for quick reference) will help you get rid of these error levels.

I repeat, you have a factor here and even if you set the values to something else the levels remain. The first decent explanation I could google might be this link here.

mts
  • 2,160
  • 2
  • 24
  • 34