1

I've been struggling with column selection with lists in R. I've loaded a bunch of csv's (all with different column names and different number of columns) with the goal of extracting all the columns that have the same name (just phone_number, subregion, and phonetype) and putting them together into a single data frame.

I can get the columns I want out of one list element with this; var<-data[[1]] %>% select("phone_number","Subregion", "PhoneType")
But I cannot select the columns from all the elements in the list this way, just one at a time.

I then tried a for loop that looks like this:

    new.function <- function(a) {
  for(i in 1:a) {
   tst<-datas[[i]] %>% select("phone_number","Subregion", "PhoneType") 

  }
  print(tst)
}

But when I try:

    new.function(5)

I'll only get the columns from the 5th element.

I know this might seem like a noob question for most, but I am struggling to learn lists and loops and R. I'm sure I'm missing something very easy to make this work. Thank you for your help.

JLA
  • 100
  • 2
  • 10
  • Each iteration of the loop, you get the columns and then assign to tst, overwriting the precious value. It’s almost always better to use lapply and return the value, rather than a for loop with assignment, since lapply keeps the list structure – divibisan Apr 26 '19 at 20:18
  • Before the loop try `tst <- vector("list", length = a)`. Then inside the loop assign `tst[[i]] <- etc`. Then, after the loop, `do.call(rbind, tst)` as the last function instruction. – Rui Barradas Apr 26 '19 at 20:19
  • Possible duplicate of [Convert a list of data frames into one data frame](https://stackoverflow.com/questions/2851327/convert-a-list-of-data-frames-into-one-data-frame) – divibisan Apr 26 '19 at 20:20

2 Answers2

4

Another way you could do this is to make a function that extracts your columns and apply it to all data.frames in your list with lapply:

library(dplyr)

extractColumns = function(x){
  select(x,"phone_number","Subregion", "PhoneType")
  #or x[,c("phone_number","Subregion","PhoneType")]
}

final_df = lapply(data,extractColumns) %>% bind_rows()
Fino
  • 1,774
  • 11
  • 21
1

The way you have your loop set up currently is only saving the last iteration of the loop because tst is not set up to store more than a single value and is overwritten with each step of the loop.

You can establish tst as a list first with:

tst <- list()

Then in your code be explicit that each step is saved as a seperate element in the list by adding brackets and an index to tst. Here is a full example the way you were doing it.

#Example data.frame that could be in datas
df_1 <- data.frame("not_selected" = rep(0, 5),
                   "phone_number" = rep("1-800", 5),
                   "Subregion"    = rep("earth", 5),
                   "PhoneType"    = rep("flip", 5))
# Another bare data.frame that could be in datas
df_2 <- data.frame("also_not_selected" = rep(0, 5),
                   "phone_number" = rep("8675309", 5),
                   "Subregion"    = rep("mars", 5),
                   "PhoneType"    = rep("razr", 5))

# Datas is a list of data.frames, we want to pull only specific columns from all of them
datas <- list(df_1, df_2)

#create list to store new data.frames in once columns are selected
tst <- list()

#Function for looping through 'a' elements
new.function <- function(a) {

  for(i in 1:a) {

    tst[[i]] <- datas[[i]] %>% select("phone_number","Subregion", "PhoneType") 

  }

  print(tst)
}

#Proof of concept for 2 elements
new.function(2)
Adam Kemberling
  • 301
  • 1
  • 11