1

I have a data frame in R

data.frame(age = 18,19,29,
     rate = 1.2,4.5,6.8
     sex = "male","female","male")

I would like to get the rate associated with values age =18 and sex = male. Is there a way I can index with those values and be able to do this with any pair of age and sex values.

I can do this in dpylr using filter and select commands but this is too slow for what I'm trying to do

camille
  • 16,432
  • 18
  • 38
  • 60
yungFanta
  • 31
  • 5
  • 2
    What have you tried already? I'm not sure why it would be too slow – camille Jan 17 '20 at 16:16
  • @camille filter then select – yungFanta Jan 17 '20 at 18:12
  • [See here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) on making a reproducible example that includes the code you're trying to debug, not just a description of it – camille Jan 17 '20 at 18:16

3 Answers3

3

assuming that df is your dataframe:

df[(df$age == 18 & df$sex == 'male'),]
LeroyFromBerlin
  • 395
  • 3
  • 12
1

Alternatively, you can use subset.

Assuming your dataframe is called df:

df1 <- subset(df,df$age==18 & df$sex=='male')

And then

View(df1)
cmirian
  • 2,572
  • 3
  • 19
  • 59
1

your example data.frame is not properly working, here's one ;) first you can subset the data, then calculate how many rows you have in that subset versus the main set.

df <- data.frame(age = c(18,19,29),
           rate = c(1.2,4.5,6.8),
           sex = c("male","female","male"),
           stringsAsFactors = F)
df_sub <- subset(df, age==18 & sex %in% "male")
df_rate <- nrow(df_sub)/nrow(df)

Though if you say filter and select are too slow, you might want to convert your data.frame into a data.table, they are normally faster than data.frames.

library(data.table)    
dt <- as.data.table(df)
nrow(dt[age==18 & sex %in% "male"])/nrow(dt)

# or more data.table-like:

dt[age==18 & sex %in% "male", .N] / dt[,.N]
NicolasH2
  • 774
  • 5
  • 20