0

I have a data frame in R, in which one of the columns contains state abbreviations like 'AL','MD' etc.

Say I wanted to extract the data for state = 'AL', then the following condition dataframe['AL',] only seems to return one row, whereas there are multiple rows against this state.

Can someone help me understand the error in this approach.

Sotos
  • 51,121
  • 6
  • 32
  • 66
Aman
  • 43
  • 1
  • 7
  • 2
    Could you please provide a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) ? – Eric May 22 '20 at 11:28

2 Answers2

1

this should work

mydataframe[mydataframe$state == "AL",]

or if you want more than one sate

mydataframe[mydataframe$state %in% c("AL","MD"),]
Daniel O
  • 4,258
  • 6
  • 20
  • 1
    `mydataframe` is a dataframe. You need to filter on the variablle, something like `df[df$state == 'AL',]`. ANd also when you do it for more than one states you need to use `%in%` not `==` – Sotos May 22 '20 at 11:41
  • Thanks @Sotos, I'm not sure How I missed the variable call, I must have answered too hastily. The `%in%` was just a mistake on my part. – Daniel O May 22 '20 at 11:49
0

In R, there are always multiple ways to do something. We'll illustrate three different techniques that can be used to subset data in a data frame based on a logical condition.

We'll use data from the 2012 U.S. Hospital Compare Database. We'll check to see whether the data has already been downloaded to disk, and if not, download and unzip it.

if(!file.exists("outcome-of-care-measures.zip")){
     dlMethod <- "curl"
     if(substr(Sys.getenv("OS"),1,7) == "Windows") dlMethod <- "wininet"
     url <- "https://d396qusza40orc.cloudfront.net/rprog%2Fdata%2FProgAssignment3-data.zip"
     download.file(url,destfile='outcome-of-care-measures.zip',method=dlMethod,mode="wb")
     unzip(zipfile = "outcome-of-care-measures.zip")    
}

## read outcome data & keep hospital name, state, and some
## mortality rates. Notice that here we use the extract operator
## to subset columns instead of rows 
theData <- read.csv("outcome-of-care-measures.csv",
                    colClasses = "character")[,c(2,7,11,17,23)]

This first technique matches the one from the other answer, but we illustrate it with both $ and [[ forms of the extract operator during the subset operation.

# technique 1: extract operator
aSubset <- theData[theData$State == "AL",]
table(aSubset$State)

AL 
98 

aSubset <- theData[theData[["State"]] == "AL",]
table(aSubset$State)

AL 
98 
> 

Next, we can subset by using a Base R function, such as subset().

# technique 2: subset() function
aSubset <- subset(theData,State == "AL")
table(aSubset$State)

AL 
98 
>

Finally, for the tidyverse fans, we'll use dplyr::filter().

# technique 3: dplyr::filter()
aSubset <- dplyr::filter(theData,State == "AL")
table(aSubset$State)
AL 
98 
> 
Len Greski
  • 10,505
  • 2
  • 22
  • 33