1

I have this following problem:

If I have multiple data frames stored inside a list and I would like to shift the last 2 columns to a position in between the 1st and 2nd column. The index of the 1st and 2nd in all the data frames is always the same but the index of the last 2 columns is always different.

I give and example for two data frames:

df1

c1. c2 c3 c4 c5  c6
1.  4. a.  d. 7. 10
2.  5. b.  e. 8. 11
3.  6. c.  f. 9. 12

Output

df1

c1. c5  c6 c2 c3 c4 
1.  7. 10  4. a.  d. 
2.  8. 11  5. b.  e. 
3.  9. 12  6. c.  f. 

df2

c1. c2 c2_1 c3  c3_1 c4  c5  c6
1.  4.  a1. b5.  d.  h    7. 10
2.  5.  b.  7g   e.  j    8. 11
3.  6.  c.  9r   f.  l    9. 12

Output:

c1. c5  c6  c2 c2_1 c3  c3_1 c4  
1.   7. 10  4.  a1. b5.  d.  h    
2.   8. 11  5.  b.  7g   e.  j    
3.   9. 12  6.  c.  9r   f.  l    

df1 and df2 are stored in list_df

list_df<-list(df1,df2)

I wrote something like this but it does not seem to work:

list_new<-list()
for (i in 1:length(list_df)){
  list_new[[i]]<- function(x) {cbind(x[[i]][,1], x[[i]][,ncol(x[[i]]-2)], x[[i]][,2:ncol(x[[i]])])
}}
Paolo Lorenzini
  • 579
  • 2
  • 15

5 Answers5

1

At first you need to use purrr::map() or lapply() to execute a certain function to each cell of the list. And then by dplyr::select(), Use c(1, ncol(.)-1, ncol(.)) to determine the indices of the first 3 columns and everything() represents the rest.

library(tidyverse)
map(df.list, ~ select(., c(1, ncol(.)-1, ncol(.)), everything()))

$df1
  c1. c5 c6 c2 c3 c4
1   1  7 10  4 a. d.
2   2  8 11  5 b. e.
3   3  9 12  6 c. f.

$df2
  c1. c5 c6 c2 c2_1  c3 c3_1 c4
1   1  7 10  4  a1. b5.   d.  h
2   2  8 11  5   b.  7g   e.  j
3   3  9 12  6   c.  9r   f.  l

Here is a base way. I think it will be the fastest one if you're concerned about the efficiency.

lapply(df.list, function(x) x[c(1, ncol(x)-1, ncol(x), 2:(ncol(x)-2))])

Data

df.list <- list(df1 = structure(list(c1. = c(1, 2, 3), c2 = c(4, 5, 6), 
    c3 = structure(1:3, .Label = c("a.", "b.", "c."), class = "factor"), 
    c4 = structure(1:3, .Label = c("d.", "e.", "f."), class = "factor"), 
    c5 = c(7, 8, 9), c6 = 10:12), class = "data.frame", row.names = c(NA, -3L)),
    df2 = structure(list(c1. = c(1, 2, 3), c2 = c(4, 5, 6), 
    c2_1 = structure(1:3, .Label = c("a1.", "b.", "c."), class = "factor"), 
    c3 = structure(c(3L, 1L, 2L), .Label = c("7g", "9r", "b5."
    ), class = "factor"), c3_1 = structure(1:3, .Label = c("d.", 
    "e.", "f."), class = "factor"), c4 = structure(1:3, .Label = c("h", 
    "j", "l"), class = "factor"), c5 = c(7, 8, 9), c6 = 10:12), class = "data.frame", row.names = c(NA, -3L)))
Darren Tsai
  • 32,117
  • 5
  • 21
  • 51
1

A concise data.table method:

library(data.table)
lapply(list_df, function(x) setcolorder(x, c(1, ncol(x)-1, ncol(x))))

If speed is of concern, below are the benchmarks of current answers:

library(data.table)
library(tidyverse)

list_df <- list(df1 = structure(list(c1. = c(1, 2, 3), c2 = c(4, 5, 6), 
    c3 = structure(1:3, .Label = c("a.", "b.", "c."), class = "factor"), 
    c4 = structure(1:3, .Label = c("d.", "e.", "f."), class = "factor"), 
    c5 = c(7, 8, 9), c6 = 10:12), class = "data.frame", row.names = c(NA, -3L)),
    df2 = structure(list(c1. = c(1, 2, 3), c2 = c(4, 5, 6), 
    c2_1 = structure(1:3, .Label = c("a1.", "b.", "c."), class = "factor"), 
    c3 = structure(c(3L, 1L, 2L), .Label = c("7g", "9r", "b5."
    ), class = "factor"), c3_1 = structure(1:3, .Label = c("d.", 
    "e.", "f."), class = "factor"), c4 = structure(1:3, .Label = c("h", 
    "j", "l"), class = "factor"), c5 = c(7, 8, 9), c6 = 10:12), class = "data.frame", row.names = c(NA, -3L)))

microbenchmark::microbenchmark(
    Khaynes = {lapply(list_df, function(x) setcolorder(x, c(1, ncol(x)-1, ncol(x))))},
    Paweł_Chabros = {list_df %>% map(~select(.x, 1, ncol(.x) - 1, ncol(.x), 4:ncol(.x) - 2))},
    Darren_Tsai = { map(list_df, ~ select(., c(1, ncol(.)-1, ncol(.)), everything()))}
)

# Unit: microseconds
#         expr      min       lq     mean   median        uq      max  neval
#     Khaynes   92.301   100.551  148.872  116.251  134.2015  3165.401   100
# Pawel_Chabros 3001.502 3068.451 3264.070 3104.851 3156.4515 7476.001   100
# Darren_Tsai   2917.100 2981.801 3124.193 3021.651 3061.3010 7383.101   100
Khaynes
  • 1,976
  • 2
  • 15
  • 27
0

Example using dplyr and purrr libraries:

library(dplyr)
library(purrr)

list_df %>%
  map(~select(.x, 1, ncol(.x) - 1, ncol(.x), 4:ncol(.x) - 2))
Paweł Chabros
  • 2,349
  • 1
  • 9
  • 12
0

lengths is able to count the number of data.frame columns in the list. And then use lapply to design the indices you want.

ind <- lapply(lengths(df.list), function(x) c(1, (x-1), x, 2:(x-2)))
Map("[", df.list, ind)

Output:

$df1
  c1. c5 c6 c2 c3 c4
1   1  7 10  4 a. d.
2   2  8 11  5 b. e.
3   3  9 12  6 c. f.

$df2
  c1. c5 c6 c2 c2_1  c3 c3_1 c4
1   1  7 10  4  a1. b5.   d.  h
2   2  8 11  5   b.  7g   e.  j
3   3  9 12  6   c.  9r   f.  l
0

Using the function move from this answer, and Darren's sample data :

map(df.list, move, one_of("c5","c6"), 2)
# $df1
#   c1. c5 c6 c2 c3 c4
# 1   1  7 10  4 a. d.
# 2   2  8 11  5 b. e.
# 3   3  9 12  6 c. f.
# 
# $df2
#   c1. c5 c6 c2 c2_1  c3 c3_1 c4
# 1   1  7 10  4  a1. b5.   d.  h
# 2   2  8 11  5   b.  7g   e.  j
# 3   3  9 12  6   c.  9r   f.  l

the function

move <- function(data, cols, ref, side = c("before","after")){
  if(! requireNamespace("dplyr")) stop("Make sure package 'dplyr' is installed to use function 'move'")
  side <- match.arg(side)
  cols <- rlang::enquo(cols)
  ref  <- rlang::enquo(ref)
  if(side == "before") dplyr::select(data,1:!!ref,-!!ref,-!!cols,!!cols,dplyr::everything()) else
    dplyr::select(data,1:!!ref,-!!cols,!!cols,dplyr::everything())
}
moodymudskipper
  • 46,417
  • 11
  • 121
  • 167