1

I am trying to subset data.

here's the link to sample data to play around with: https://drive.google.com/file/d/0BwIbultIWxeVOFdRaE81Nm9qc2s/view?usp=sharing

so in this data set, the last column has name "Type", which has 2 values: "normal." and "back." and let's say i am subsetting based on the "Type" column:

test.data = read.csv(file = paste0(dd, '/data_example.csv'))
test.subdata1 = subset(test.data, test.data$Type == 'normal.')
test.subdata2 = test.data[test.data$Type == 'normal.',]

here, I'm subsetting using two most common methods:

  1. by using subset()

  2. by directly filtering in the []

supposedly, the new subsetted data should only contain data that has Type ``"normal." (there's a period behind the word) and indeed, when i view the subset data table, there's only "normal." ones present.

HOWEVER, the thing is, the "back." class info is retained in my subsetted data, as shown in following output:

str(test.subdata1$Type)
# Factor w/ 2 levels "back.","normal.": 2 2 2 2 2 2 2 2 2 2 ...
str(test.subdata2$Type)
# Factor w/ 2 levels "back.","normal.": 2 2 2 2 2 2 2 2 2 2 ...

so it does not matter which subsetting method i use, the complete information from the original data set will be retained in my subset data set.

my question is: HOW to get rid of the extra info from the original data set i do not want to retain in my subset data set?

meaning, how can i see only 1 factor level in my subset data and not 2 factor levels?

MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
alwaysaskingquestions
  • 1,595
  • 5
  • 22
  • 49

1 Answers1

1
# Is this what you need?
test.subdata1$Type = as.factor(as.integer(test.subdata1$Type))

# or maybe
test.subdata1$Type = factor(test.subdata1$Type)
Rick
  • 888
  • 8
  • 10