0

I'm working with panel data in R and am endeavoring to build a function that returns every user ID where PCA==1. I've largely gotten this to work, with one small problem: it only returns the values when I end the function with print() but does not do so when I end the function with return(). As I want the ids in a vector so I can later subset the data to only include those IDs, that's a problem. Code reflected below - can anyone advise on what I'm doing wrong?

The version that works (but doesn't do what I want):

retrievePCA<-function(data) {
  for (i in 1:dim(data)[1]) {
  if (data$PCA[i] == 1) {
    id<-data$CPSIDP[i]
    print(id)
  } 
  }
}

retrievePCA(data)

The version that doesn't:

retrievePCA<-function(data) {
  for (i in 1:dim(data)[1]) {
  if (data$PCA[i] == 1) {
    id<-data$CPSIDP[i]
    return(id)
  } 
  }
}


vector<-retrievePCA(data)
vector
  • 1
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. `return` is only ever called once in a function. As soon as the `return` is it, the function exits. Maybe you just want `retrievePCA <- function(data) data$CPSIDP[data$PCA==1]`. no loop necessary – MrFlick May 25 '21 at 19:27

1 Answers1

1

Your problem is a simple misunderstanding of what a function and returning from a function does.

Take the small example below

f <- function(x){
  x <- x * x
  return x
  x <- x * x
  return x
}
f(2)
[1] 4

4 is returned, 8 is not. That is because return exits the function returning the specific value. So in your function the function hits the first instance where PCA[i] == 1 and then exits the function. Instead you should create a vector, list or another alternative and return this instead.

retrievePCA<-function(data) {
  ids <- vector('list', nrow(data))
  for (i in 1:nrow(data)) {
    if (data$PCA[i] == 1) {
      ids[[i]] <-data$CPSIDP[i]
    } 
  }
  return unlist(ids)
}

However you could just do this in one line

data$CPSIDP[data$PCA == 1]
Oliver
  • 8,169
  • 3
  • 15
  • 37