0

I have a dataset where one of the variables (columns) is labeled 'job' with rows having 4 possible values: 'home', 'office', 'other'. For my analyses, I want to ignore 'other'. How could I accomplish this? I found this piece of code, but I am having a hard time understanding what the 'drop' argument means. I would welcome any explanation.

data1 <- data[data$job !="other", , drop=FALSE]; 
data2 <- data[data$job !="other", , drop=TRUE]; 

After trying both, I do unique(data1$job) and unique(data2$job)

And I get in both cases:

[1] home office
Levels: home office other

So it's not clear to me what I have done to the data since the 'other' level is still there.

  • You don't need `drop` argument if you have more than one column. You can use `droplevels` to drop unused factor levels. `df <- droplevels(df)` – Ronak Shah May 30 '21 at 08:34
  • Thanks, I will be deleting this question as it has been previously asked and answered. Apologies to the community. – Elina Bochkova May 30 '21 at 09:24
  • However, what is an "unused" factor level? Can I decide which factors to drop? – Elina Bochkova May 30 '21 at 09:26
  • when you do `data1 <- data[data$job !="other", ]`. 'Other' value is no longer in `data1` but it is present in factor levels. Those are called unused factor levels. You can remove more than one factor by doing `data1 <- data[!data$job %in% c('factor1', 'factor2'), ]`. Replace `factor1`, `factor2` with the actual values in your data. Also no need to apologise for duplicate. I would suggest not to delete the question since if somebody else in the future uses a different keyword and lands on this post it will help them to reach the correct post. – Ronak Shah May 30 '21 at 09:29
  • Thanks so much. Now it's clear :) Is there in R a way to drop the factor without actually removing the rows corresponding to a certain factor level? I'll keep the question. – Elina Bochkova May 30 '21 at 09:32

0 Answers0