0

Say I have 2 dataframes each with two columns 'pic_type' and 'roi' (in reality I have a lot more dataframes, but 2 will work for this example)

a <- setNames(data.frame(matrix(ncol = 2,nrow =6)), c("pic_type","roi"))
b <- setNames(data.frame(matrix(ncol = 2,nrow =6)), c("pic_type","roi"))

In each dataframe 'pic_type' can be one of two string values ('item', 'relation'), 'roi' can be one of three ('object', 'relation', 'pic'). For example (excuse my poor coding)

a$pic_type <- c("item", "item", "item","relation","relation","relation")
a$roi <- c("object", "object", "pic", "object", "relation","relation")
b$pic_type <- c("item", "item", "item","relation","relation","relation")
b$roi <- c("relation", "relation", "object", "pic", "pic","object")

Which gives:

'a'
 pic_type      roi
 item          object
 item          object
 item          pic
 relation      object
 relation      relation
 relation      relation

'b'
 pic_type      roi
 item          relation
 item          relation
 item          object
 relation      pic
 relation      pic
 relation      object

And put them in a list

myList <- list(a,b)

Now I want to use lapply to go through each df in the list and create a new column called 'type' that contains one of three values per row ('occupied', 'empty' or 'nil'). These values are based on the following:

If pic_type = "item" & roi = "object", then type = "occupied"
If pic_type = "relation" & roi = "relation", then type = "occupied"
If pic_type = "item" & roi = "relation", then type = "empty"
If pic_type = "relation" & roi = "object", then type = "empty"
Otherwise type = "nil"

For example:

 'a'
 pic_type      roi        type
 item          object     occupied
 item          object     occupied
 item          pic        nil
 relation      object     empty
 relation      relation   occupied
 relation      relation   occupied

I have tried the following:

myList <- lapply(myList, function(x) for(row in 1:dim(x)[1]) { 
   if(as.data.frame(x)[row,1] == "item" && as.data.frame(x)[row,2]=="object") {as.data.frame(x)[row,3] == "occupied"}  
   else if(as.data.frame(x)[row,1] == "relation" && as.data.frame(x)[row,2]=="relation") {as.data.frame(x)[row,3] == "occupied"} 
   else if(as.data.frame(x)[row,1] == "item" && as.data.frame(x)[row,2]=="relation") {as.data.frame(x)[row,3] == "empty"} 
   else if(as.data.frame(x)[row,1] == "relation" && as.data.frame(x)[row,2]=="object") {as.data.frame(x)[row,3] == "empty"}
   else {as.data.frame(x)[row,3] == "null"}})

However this throws up the error:

Error in if (as.data.frame(x)[row, 1] == "item" && as.data.frame(x)[row,  : 
  missing value where TRUE/FALSE needed

Can anyone offer a solution? I am aware that with just two dfs it is easier to do it without lapply, but I have many dfs in the actual list and want to apply this function to each one of them.

Thanks in advance!

3 Answers3

0

This works by using a dataframe as a mapping table rather than your if-then statements

# first lets build your data frames in a list
a <- setNames(data.frame(matrix(ncol = 2,nrow =6)), c("pic_type","roi"))
b <- setNames(data.frame(matrix(ncol = 2,nrow =6)), c("pic_type","roi"))
a$pic_type <- c("item", "item", "item","relation","relation","relation")
a$roi <- c("object", "object", "pic", "object", "relation","relation")
b$pic_type <- c("item", "item", "item","relation","relation","relation")
b$roi <- c("relation", "relation", "object", "pic", "pic","object")
myList <- list(a,b)

# build the mapping table
mapping = c("item", "object", "occupied",
"relation", "relation", "occupied",
"item", "relation",  "empty",
"relation", "object", "empty")
dim(mapping) =c(3,4)
mapping = as.data.frame(t(mapping))
colnames(mapping)= c("pic_type","roi","type")

The addTheColumnType function matches the rows of a dataframe with the mapping table and returns the dataframe with an additional column "type":

addTheColumnType = function (df, mapping){
  # build keys for columns of interest
  mappingKey = apply(mapping[,c("pic_type","roi")],1,paste, collapse="-")
  aKey  = apply(df,1,paste, collapse="-")
  # match the keys and pick the type
  df$type = mapping$type [match(aKey, mappingKey)]
  # replace NAs by nil (for unmatched rows)
  df$type[is.na(df$type)] = "nil"
  return (df)
}

Finally, apply this function to your list of dataframes

lapply(myList, addTheColumnType, mapping=mapping)
tom57
  • 21
  • 2
0

Welcome to stackoverflow.

R works a little differently than other software packages, and it is useful to note that there are two 'if/else' commands. Please see else if(){} VS ifelse() for a description. Like many commands in R, ifelse is vectorised, which means it will accept a vector and output a vector - ie. there is no need to explicitly tell it to run row by row in a data frame.

For your example you want to be using ifelse(), or even better the case_when command from the dplyr library (from the tidyverse collection https://www.tidyverse.org/) that allows for testing multiple conditions (see https://community.rstudio.com/t/case-when-why-not/2685/2 for a general discussion of the options). Below I also make use of the base within command, but could equally use the mutate command from the dplyr library.

library(dplyr)

a <- data.frame(
  pic_type = c("item", "item", "item","relation","relation","relation"),
  roi = c("object", "object", "pic", "object", "relation","relation")
)

b <- data.frame(
  pic_type = c("item", "item", "item","relation","relation","relation"),
  roi = c("relation", "relation", "object", "pic", "pic","object")
)

myList <- list(a = a, b = b)

myList <- lapply(myList, function(x) {

    x <- within(x, {
      type = case_when(
        (pic_type == "item" & roi == "object") |
          (pic_type == "relation" & roi == "relation") ~ "occupied",
        (pic_type == "item" & roi == "relation") | 
          (pic_type =="relation" & roi == "object") ~ "empty",
        TRUE ~ "nil")        
    })

  return(x)

})

myList$a
JWilliman
  • 3,558
  • 32
  • 36
0

As the list items you are iterating over are already dataframes I would suggest to skip the second rowwise loop and do the assignments directly based on the whole columns:

myList <- lapply(myList, function(x) {
    x$type = "nil"
    x$type[x$pic_type== "item" && x$roi=="object" ]  ="occupied"
    x$type[x$pic_type== "relation" && x$roi=="relation" ]  ="occupied"
    x$type[x$pic_type== "item" && x$roi=="relation" ]  ="empty"
    x$type[x$pic_type== "relation" && x$roi=="object" ]  ="empty"
    return(x)
} 

Also for setting your type you used == which performs comparisons, but for assignments you have to use single =.

Alex
  • 106
  • 5