-3

I want to choose data descriptively by headers. Here, an example to choose IDs of males in .CSV data. You can do data[3] == "males" with the following data but I would like to do data[Gender] == "males" to avoid any mistakes. File data.csv

ID,Age,Gender
100,69,male
101,75,female
102,84,female
103,,male
104,66,female

Code where lastline pseudocode

data = read.csv("/home/masi/data.csv",header = TRUE,sep = ",")
str(data)

# PSseudocode
#data.Gender == "male"
#data[Gender] == "male"

Eli

Now, we have a list of males, and we want to return the IDs corresponding to those males

eliData <- data$Gender == "male"
# to return IDs corresponding to males
# Pseudocode
data$ID == eliData

Pseudocode returns false for all.

Motivation: to make characteristic correlation matrices for different epidemiological groups where each data point has many own characteristics.

OS: Debian 8.5
R: 3.1.1

Community
  • 1
  • 1
Léo Léopold Hertz 준영
  • 134,464
  • 179
  • 445
  • 697

1 Answers1

1

You can use $ notation in R for this. data$Gender == "male" is what you want. To get the ids from the rows where "male" is the gender you can do this

males <- data$Gender == "male"
maleIDs <- data[which(males), ]$ID

Here Eli's great function for the general task

getIDs <- function(age, gender) {
        data <- read.csv("/home/masi/data.csv",header = TRUE,sep = ",")

        gender <- data$Gender == gender
        if (length(age) == 1) {
                ages <- data$Age == age
        } else {
                ages <- (data$Age >= age[1] & data$Age <= age[2])
        }
        genderIDs <- data[which(gender), ]$ID
        ageIDs <- data[which(ages), ]$ID
        intersect(ageIDs, genderIDs)
}
# So if you called this as getIDs(c(20, 30), "male")
# You'd get the ids of all males with age >= 20 and <= 30
Léo Léopold Hertz 준영
  • 134,464
  • 179
  • 445
  • 697
Eli Sadoff
  • 7,173
  • 6
  • 33
  • 61