-1

Apologies in advance, but I'm new both here and to R. What I'm trying to do is automate adding a column into a data frame that is filled with the actual name of the data frame. For example, if I have the following data frame:

> q103
  a  b  c  d
d 1  4  6  9
e 2  8  3  12
f 3 12  8  16

How can I add a column to the end of it that has the character string q103 in each row (without specifically naming it as such, since I will need to repeat this for several hundred data frames), so that I end up with:

> q103
  a  b  c  d  X
d 1  4  6  9 q103
e 2  8  3 12 q103
f 3 12  8 16 q103

The problem is that there are a lot of these data frames and they inside a list of lists (e.g., something like List[[list]][["q100277']] is a data frame in the list). Also, their names are somewhat random, but are important to keep (I can't just rename them sequentially). So, I need a way to tell R to basically "look at the name of data frame X and add that character string to a new column in the data frame, then do this for every data frame in the list"). It feels like some sort of lapply would work, but I have no idea what to actually tell it to do in order to get there.

Any help in figuring out how to get a column into each data frame that is just populated by the name of that data frame without doing so manually for each data frame is greatly appreciated!

EDIT: I've tried to create a reproducible example (per comments) below. This will create something similar to what I'm looking at (except the example is a much smaller list!)

library(CTT)
library(dplyr)
library(tidyverse)
library(purrr)

## Create student response patterns for a fake test

q102 <- c("A", "B", "C", "D", "O", "A", "A", "C", "D", "A", "C", "D", "O", "D", "A", "B", "A", "C", "D", "A")
q107 <- c("C", "D", "O", "D", "A", "B", "A", "C", "D", "A", "A", "B", "C", "D", "O", "A", "A", "C", "D", "A")
q1045 <- c("B", "O", "C", "A", "D", "B", "O", "C", "A", "D", "B", "O", "C", "A", "D", "B", "O", "C", "A", "D")
q101 <- c("A", "B", "C", "D", "O", "A", "A", "C", "D", "A", "B", "O", "C", "A", "D", "B", "O", "C", "A", "D")
q1064 <- c("C", "D", "O", "D", "A", "B", "A", "C", "D", "A", "A", "B", "C", "D", "O", "A", "A", "C", "D", "A")
q104 <- c("A", "B", "C", "D", "O", "A", "A", "C", "D", "A", "B", "O", "C", "A", "D", "B", "O", "C", "A", "D")

## Create an assessment key to identify the test
AssessmentKey <- c("ADW", "ADW", "ADW", "ADW", "ADW", "ADW", "ADW", "ADW", "ADW", "ADW", "ADW", "ADW", "ADW", "ADW", "ADW", "ADW", "ADW", "ADW", "ADW", "ADW")

## Assign response pattern to the assessment key
Students1 <- data.frame(q102, q107, q1045, q101, q1064, q104, AssessmentKey)
remove(AssessmentKey)

## Create a second assessment key to identify a different test
AssessmentKey <- c("XYZ", "XYZ", "XYZ", "XYZ", "XYZ", "XYZ", "XYZ", "XYZ", "XYZ", "XYZ", "XYZ", "XYZ", "XYZ", "XYZ", "XYZ", "XYZ", "XYZ", "XYZ", "XYZ", "XYZ")

## Assign the response pattern to the second assessment key
Students2 <- data.frame(q102, q107, q1045, q101, q1064, q104, AssessmentKey)
remove(q102, q107, q1045, q101, q1064, q104, AssessmentKey)
## Create a data frame combining the two different assessments
StudentAnswers <- rbind(Students1, Students2)

## Create a data frame with the answer key for both tests
AnswerKey <- c("A", "B", "A", "A", "C", "D", "A", "B", "A", "A", "C", "D")
QuestionKey <- c("q102", "q107", "q1045", "q101", "q1064", "q104",
                 "q102", "q107", "q1045", "q101", "q1064", "q104")
AssessmentKey <- c("ADW", "ADW", "ADW", "ADW", "ADW", "ADW", "XYZ", "XYZ", "XYZ", "XYZ", "XYZ", "XYZ")
AnswerKeys <- data.frame(QuestionKey, AnswerKey, AssessmentKey)
remove(AnswerKey, QuestionKey, AssessmentKey)



X <- c("ADW", "XYZ")
y <- lapply(
  (X), function(x) 
  {
    ## This will filter the data file to a specific assessment and 
    ## select the columns needed for analysis
    StudentResponse <- StudentAnswers %>%
      dplyr::filter(AssessmentKey == x) %>%
      dplyr::select(q102, q107, q1045, q101, q1064, q104)
  
      
      AKey <- AnswerKeys %>%
        dplyr::filter(AssessmentKey == x) %>%
        dplyr::select(AnswerKey)

      ## using safely from the purr package to run the distractorAnalyis
      ## function from CTT in case of errors

      safeDA = safely(.f=distractorAnalysis)
      safeDA(StudentResponse, AKey)
      
      
    }
)

## This part removes the empty "error" data frames from the list generated above.
Z <- c(1:length(y))
Results <- lapply(
  (Z), function(Z)
  {  y[[Z]][["result"]]
    
  })
  
  • 2
    Could you provide a small [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example)? Specifically, it will help to know whether the data frames are at different depths and how deep they are. Functions like `imap` from `purrr` can easily iterate over both a list's names and its contents but exactly how to call it will depend on the nested list set up. – Calum You May 18 '21 at 21:00
  • Is your data something like `a <- head(mtcars); list(list(name1=a, name2=a))` ? An example can be something very simple like that to give us an indication of how to answer your problem. – thelatemail May 18 '21 at 21:54
  • Updated the post with an example that will generate a similar list to the one I'm working with (albeit shorter in length) – Sean Johnson May 19 '21 at 15:09

2 Answers2

0

If I understand correctly, you want to extract the names of a number of (nested) list members, then assign a column into a dataframe contained in that list member.

This is a quick and dirty solution using example data. It is not best-practices, but it will do in a hurry. Note the <<- to travel up the various environment levels until you find the list in the global environment.

# Example data
data(mtcars)

first_list <- list()
first_list[["item1"]] <- list()
first_list[["item2"]] <- list()
first_list[["item1"]][["level2_item1"]] <- mtcars
first_list[["item1"]][["level2_item2"]] <- mtcars

# Iterate through the names of "item1", look up the corresponding dataframe, add a column

lapply(
  names(first_list[["item1"]]),
  function(x) {
    first_list[["item1"]][[x]]$NewCol <<- x
  }
)
tinker
  • 96
  • 2
0

If I interpret this corrrectly, you would like to find all dataframes in a list and add a column with the name of that element.

You can do this with a combination of rlang::squash and purrr::map2 function.

  1. squash will recursively flatten your list into a single list of dataframes.
  2. Then you can map over each element and add a column with the name of the list element.

I have provided a solution to remove the hierarchies from the list and one where you maintain the structure of the list.

my_list <- list(
  q0 = mtcars,
  sub_list_1 = list(
      q1 = mtcars
    , q2 = mtcars
  )
  , sub_list_2 = list(
      sub_sub_list_1 = list(
        q3 = mtcars,
        q4 = mtcars
      )
      , sub_sub_list_2 = list(
        q5 = mtcars,
        q6 = mtcars
      )
  )
)
# function to add name col
add_col <- function(table, name) {
  if(!is.data.frame(table)) return(table) # If not a dataframe just return
  
  table$X <- name # add column
  
  return(table)
}

Solution 1

# Using pipes (%>%), purrr, rlang
library(rlang)
library(purrr)

my_list %>% 
  squash() %>% 
  map2(names(.), add_col)

# Using rlang and base R
flat_list <- squash(my_list)
mapply(add_col, flat_list, names(flat_list), SIMPLIFY = F)

If you want to maintain the structure of the list you can recursively go through and apply our add_col function

Solution 2

library(purrr)
library(rlang)

recursive_add_col <- function(x) {
  map2(x, names(x), 
      function(x, y) if(is.list(x) & !is.data.frame(x)) recursive_add_col(x) else add_col(x, y)
      )
}

my_list %>% 
  recursive_add_col()
Croote
  • 1,382
  • 1
  • 7
  • 15