0

Simplified code where two parameter age and gender; however, I would like to pick cases only by gender or age; I am thinking how you can overload to getIDs(age) and getIDs(gender) without multiplicating same code again and again; assume you have 50 parameters etc; I tried getIDs(age, "") but I it is not a good idea

getIDs <- function(age, gender) {
    # https://stackoverflow.com/a/40330110/54964

    ageIDs <- c(1,2,3)
    genderIDs # dummy code here to indicate that do not use genderIDs if gender ""

    intersect(ageIDs, genderIDs)
}

Main data

ID,Age,Gender
100,69,male
101,75,female
102,84,female
103,,male
104,66,female

Data 2

DF <- structure(list(ID = 100:104, Age = c(69L, 75L, 84L, NA, 66L), Gender = 
c("male", "female", "female", "male", "female")), .Names = c("ID", "Age", 
"Gender"), row.names = c(NA, -5L), class = "data.frame") 

Similarly for age: if age=="", do not include subsetageIDs` in.

Some parameter for all male would be great such that you do not need to do explicitly "male", "male", ....

Algorithm based on Roman's answer

I think this strategy is very challenging with 50 parameters so better way is still needed

getIDs <- function(age, gender) {
# https://stackoverflow.com/a/40330110/54964
# So if you called this as getIDs(c(20, 30), "male")
# You'd get the ids of all males with age >= 20 and <= 30
# 
# NULL = ALL
# getIDs(age = c(1,2), gender = NULL)
# getIDs(age = NULL, gender = "male")
        data <- read.csv("/home/masi/data.csv",header = TRUE,sep = ",")

        if (is.null(gender)) {
                genderIDs <- data$ID
        } else {
                gender <- data$Gender == gender
                genderIDs <- data[which(gender), ]$ID
        }

        if (is.null(age)) {
                age <- c(0,130)
        }
        if (length(age) == 1) {
                ages <- data$Age == age
        } else {
                ages <- (data$Age >= age[1] & data$Age <= age[2])
        }
        ageIDs <- data[which(ages), ]$ID

        intersect(ageIDs, genderIDs)
}

OS: Debian 8.5
R: 3.1.1

Community
  • 1
  • 1
Léo Léopold Hertz 준영
  • 134,464
  • 179
  • 445
  • 697
  • 1
    "getIDs(age) and getIDs(gender)" can *maybe* be distinguished if age is a number and gender is a string. There are only so many data types in R, though, and if you have 50 of them that you want the function to match without passing the arg name, there'll be trouble. – Frank Oct 31 '16 at 19:02
  • I read you as asking that getIDs(45) and getIDs("male") should work... right? – Frank Oct 31 '16 at 19:02
  • 1
    For 50 parameters, a named vector may be a better solution. – Roman Luštrik Oct 31 '16 at 19:06
  • 1
    See edit of my question if you find any potential solutions. – Roman Luštrik Oct 31 '16 at 19:20

3 Answers3

5

You can specify default values to your parameters and catch them downstream.

For example, if you make age = NULL, you can catch it using

if (is.null(age)) {
    # do something
}

The same holds true for other parameters. Another fine option is to use NA, caught by is.na function.

edit

Following the discussion, fifty parameters is a hand full to handle in any case. You have several options, depending on your needs.

If all arguments are of the same data type, you can use a named vector, e.g.

x <- c(arg1 = "1", arg2 = "this")

If you have different data types and you do not want them to be coerced to one type (numeric will be coerced to character, if one parameter is character, try c(1, "2")), you can use a list.

x <- list(par1 = 1,
          par2 = "2",
          par3 = factor(3),
          par4 = TRUE)

Working with lists is very natural in R, you can manipulate it using e.g. sapply or lapply. You could find all numeric values

> x[sapply(x, is.numeric)]
$par1
[1] 1

Or just based on name alone

> x[grepl(paste("par", 1:2, sep = "", collapse = "|"), names(x))]
$par1
[1] 1

$par2
[1] "2"
Roman Luštrik
  • 69,533
  • 24
  • 154
  • 197
  • I never use it, but `missing` also seems to sort of do it, like `f = function(a, b) if (missing(b)) a else b; f(1,2); f(1)`..? Oh never mind, reread the OP's question and am now thoroughly confused. – Frank Oct 31 '16 at 18:54
  • @Frank Can you please extend your comment into an answer so I can understand it better? I made an implication of Roman's answer into the body and it has serious limitations with 50 parameters. Very difficult to maintain it. – Léo Léopold Hertz 준영 Oct 31 '16 at 19:00
  • @Roman Can you please compare your proposals to yeedle's proposal which I like a lot because of its flexibility. – Léo Léopold Hertz 준영 Oct 31 '16 at 21:33
4

Following Roman's idea, I'd maybe use a list:

library(data.table)
setDT(DF)

getIDs <- function(L) DF[L, on=names(L), ID]

Usage:

> getIDs(list(Gender = "male"))
[1] 100 103
> getIDs(list(Gender = "male", Age = NA))
[1] 103

Data

DF = structure(list(ID = 100:104, Age = c(69L, 75L, 84L, NA, 66L), 
Gender = c("male", "female", "female", "male", "female")), .Names = c("ID", 
"Age", "Gender"), row.names = c(NA, -5L), class = "data.frame")
Frank
  • 66,179
  • 8
  • 96
  • 180
  • @Masi I don't understand what you're asking. – Frank Oct 31 '16 at 19:46
  • @Masi Sorry, I still don't follow. – Frank Oct 31 '16 at 20:59
  • 1
    @Masi Ok. Yeah, that makes sense. The approach in this answer only works for equality testing. If you want a sequence of queries, then yeedle's is the best fit. I've edited it to demonstrate. – Frank Oct 31 '16 at 21:09
3

Using dplyr you can write a general function where you can pass whatever condition you like to the function as a string, and it'll return the values. This scales easily to multiple parameters, as long as your condition string can be evaluated by dplyr (the outputs were generated using the dataframe you provided in this question:

library(dplyr)
getIDs <- function(conditon)
{
  data <- read.csv("/home/masi/data.csv", header = T)
  df <- data %>% filter_(conditon) %>% .$ID
}

getIDs("Gender == 'male'")
# [1] 100 103

getIDs("Age > 30")
# [1] 100 101 102 104

getIDs("Gender == 'male' & Age > 30")
# [1] 100

If you don't need to read in data within the function, the function can be written like

getIDs <- . %>% filter_(DF, .) %>% .$ID

Defining functions this way is a feature of magrittr chains.


If you want to pass a sequence of queries as arguments:

getIDs <- function(...){
    DF %>% filter_(...) %>% .$ID
} 

getIDs("Gender == 'male'", "Age > 30")
# [1] 100

If you want to get the result sorted by one of the parameters, add an arrange to the dplyr pipline:

getIDs <- function(..., by = NULL){
    DF %>% filter_(...) %>% { if (!is.null(by))  arrange_(., by) else . } %>% .$ID
} 

getIDs("Gender == 'female'", "Age > 10", by = "Age")
# [1] 104 101 102

# descending order:
getIDs("Gender == 'female'", "Age > 10", by = "desc(Age)")
# [1] 102 101 104
josliber
  • 43,891
  • 12
  • 98
  • 133
yeedle
  • 4,918
  • 1
  • 22
  • 22
  • It uses a completely different paradigm. At the end of the day, with fifty parameters, you'll have to do a lot of manual enumeration of the conditions. I find it easier to work with explicit conditions, and a clear chain of the steps of transformations I make. Others like the conciseness of base R. – yeedle Oct 31 '16 at 19:35
  • 2
    Instead of reading in data each time the function is called, probably better to assign it to the environment. Also in that case you can write the whole function using a chain: `getIDs <- . %>% filter_(DF, .) %>% .$ID; environment(getIDs)$DF <- structure(list(ID = 100:104, Age = c(69L, 75L, 84L, NA, 66L), Gender = c("male", "female", "female", "male", "female")), .Names = c("ID", "Age", "Gender"), row.names = c(NA, -5L), class = "data.frame") ; getIDs("Gender == 'male'")` – Frank Oct 31 '16 at 19:37
  • That is correct @Frank. I was only following Masi's example in the OP. Obviously, reading it in every time is inefficient. – yeedle Oct 31 '16 at 19:43
  • **Can you return IDs in Age descending or ascending order by this method?** Etc pseudocode `getIDs("Gender == 'male'", "Age > 30", "Age.ascednig")`. – Léo Léopold Hertz 준영 Oct 31 '16 at 21:41
  • 1
    absolutely: `getIDs <- function(..., by = "Age"){ DF %>% filter_(...) %>% arrange_(by) %>% .$ID }`. I added it to my answer – yeedle Oct 31 '16 at 22:08
  • 1
    See my answer. If you ignore the `by` argument, the output wont be sorted by any order. If you specify it, it'll be sorted by ascending order. If you want descending order, you can specify `"getIDs("Gender == 'male', by="desc(Age)")"` – yeedle Oct 31 '16 at 22:24