0

In the dataset below the variable Region before subsetting has the following structure:

> levels(corona$Region)
  [1] " Montreal, QC"                              
  [2] "Alabama"                                    
  [3] "Alameda County, CA"                         
  [4] "Alaska"                                     
  [5] "Alberta"                                    
  [6] "American Samoa"                             
  [7] "Anhui" ...

including both United States states as well as counties, and cities, etc.

I want to subset just the states in the United States running the code:

require(RCurl)
require(foreign)
require(tidyverse) 

corona = read.csv("https://coviddata.github.io/covid-api/v1/regions/cases.csv", sep =",",header = T)

cor <- corona[corona$Country=="United States" & corona$Region %in% state.name,]

which works, in a way, but somehow keeps the original levels for Region:

> levels(cor$Region)
  [1] " Montreal, QC"                              
  [2] "Alabama"                                    
  [3] "Alameda County, CA"                         
  [4] "Alaska"                                     
  [5] "Alberta"                                    
  [6] "American Samoa"                             
  [7] "Anhui"                                      
  [8] "Arizona"                                    
  [9] "Arkansas"                                   
 [10] "Aruba"   ...

as though the subsetting never happened. How can I keep only the levels subsetted (the states)?

Antoni Parellada
  • 4,253
  • 6
  • 49
  • 114

1 Answers1

3

You can try

cor <- droplevels(cor)

Here, an example using iris dataset:

ir <- subset(iris, Species != "setosa")

> str(ir)
'data.frame':   100 obs. of  5 variables:
 $ Sepal.Length: num  7 6.4 6.9 5.5 6.5 5.7 6.3 4.9 6.6 5.2 ...
 $ Sepal.Width : num  3.2 3.2 3.1 2.3 2.8 2.8 3.3 2.4 2.9 2.7 ...
 $ Petal.Length: num  4.7 4.5 4.9 4 4.6 4.5 4.7 3.3 4.6 3.9 ...
 $ Petal.Width : num  1.4 1.5 1.5 1.3 1.5 1.3 1.6 1 1.3 1.4 ...
 $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 2 2 2 2 2 2 2 2 2 2 ...

Despite we removed one levels of Species, it still have 3 factor levels displayed. But if you are doing:

ir <- droplevels(ir)

> str(ir)
'data.frame':   100 obs. of  5 variables:
 $ Sepal.Length: num  7 6.4 6.9 5.5 6.5 5.7 6.3 4.9 6.6 5.2 ...
 $ Sepal.Width : num  3.2 3.2 3.1 2.3 2.8 2.8 3.3 2.4 2.9 2.7 ...
 $ Petal.Length: num  4.7 4.5 4.9 4 4.6 4.5 4.7 3.3 4.6 3.9 ...
 $ Petal.Width : num  1.4 1.5 1.5 1.3 1.5 1.3 1.6 1 1.3 1.4 ...
 $ Species     : Factor w/ 2 levels "versicolor","virginica": 1 1 1 1 1 1 1 1 1 1 ...

You can noticed that now Species has 2 factor levels instead of 3.

Does it answer your question ?

dc37
  • 15,840
  • 4
  • 15
  • 32