I am trying to subset data.
here's the link to sample data to play around with: https://drive.google.com/file/d/0BwIbultIWxeVOFdRaE81Nm9qc2s/view?usp=sharing
so in this data set, the last column has name "Type"
, which has 2 values: "normal."
and "back."
and let's say i am subsetting based on the "Type"
column:
test.data = read.csv(file = paste0(dd, '/data_example.csv'))
test.subdata1 = subset(test.data, test.data$Type == 'normal.')
test.subdata2 = test.data[test.data$Type == 'normal.',]
here, I'm subsetting using two most common methods:
by using
subset()
by directly filtering in the
[]
supposedly, the new subsetted data should only contain data that has Type ``"normal."
(there's a period behind the word)
and indeed, when i view the subset data table, there's only "normal."
ones present.
HOWEVER, the thing is, the "back."
class info is retained in my subsetted data, as shown in following output:
str(test.subdata1$Type)
# Factor w/ 2 levels "back.","normal.": 2 2 2 2 2 2 2 2 2 2 ...
str(test.subdata2$Type)
# Factor w/ 2 levels "back.","normal.": 2 2 2 2 2 2 2 2 2 2 ...
so it does not matter which subsetting method i use, the complete information from the original data set will be retained in my subset data set.
my question is: HOW to get rid of the extra info from the original data set i do not want to retain in my subset data set?
meaning, how can i see only 1 factor level in my subset data and not 2 factor levels?