How to split data frame by column names in R?

Question

My 24 hours of search for what I feel is a trivial (Not for a newbie in R as I am) problem has not yet born fruits. So please help me out. I have a single data frame that I would wish to split into two. Here is how the data looks like;

d1 d2 d3 d4 p1 p2 p3 p4
30 40 20 60 1  3  2  5  
20 50 40 30 3  4  1  5 
40 20 50 30 2  3  1  4

here is what I want it to look like;

$d
d1 d2 d3 d4
30 40 20 60
20 50 40 30
40 20 50 30 

$p
p1 p2 p3 p4
1  3  2  5 
3  4  1  5
2  3  1  4

I have tried to most of the commands and examples online but they all seem to be splitting data along rows such as in:

split(1:3, 1:2)

How can I indicate even with the use of indexes that I want to split the first 4 columns from the last four?

moodymudskipper · Answer 1 · 2018-07-12T11:14:45.840

13

Using sapply and startsWith:

sapply(c("d", "p"),
       function(x) df[startsWith(names(df),x)],
       simplify = FALSE)

# $d
# d1 d2 d3 d4
# 1 30 40 20 60
# 2 20 50 40 30
# 3 40 20 50 30
# 
# $p
# p1 p2 p3 p4
# 1  1  3  2  5
# 2  3  4  1  5
# 3  2  3  1  4

A tidyverse translation:

library(tidyverse)
map(set_names(c("d", "p")),~select(df,starts_with(.x)))
# $d
# d1 d2 d3 d4
# 1 30 40 20 60
# 2 20 50 40 30
# 3 40 20 50 30
# 
# $p
# p1 p2 p3 p4
# 1  1  3  2  5
# 2  3  4  1  5
# 3  2  3  1  4

edited Jul 12 '18 at 11:14

answered Jul 12 '18 at 10:39

moodymudskipper

46,417
11
121
167

would you know how to pass the `df` in a pipe, `df %>% map(set_names(c("d", "p")),~select(.,starts_with(.x)))` doesn't work as expected. I thought this might though: `df %>% map(~set_names(c("d", "p")) %>% select(., starts_with(.x)))` but doesn't, would you have any suggestions? thanks – user63230 Mar 26 '20 at 15:07
1

Maybe `df %>% {map(set_names(c("d", "p")), function(.x) select(.,starts_with(.x)))} ` so the dot is clearly the df and not `.x` as is the case with the formula notation – moodymudskipper Mar 26 '20 at 20:01

score 11 · Answer 2 · answered Jul 12 '18 at 06:07

11

Here is an option with split from base R

split.default(df1, sub('\\d+', '', names(df1)))
#$d
#  d1 d2 d3 d4
#1 30 40 20 60
#2 20 50 40 30
#3 40 20 50 30

#$p
#  p1 p2 p3 p4
#1  1  3  2  5
#2  3  4  1  5
#3  2  3  1  4

data

df1 <- structure(list(d1 = c(30L, 20L, 40L), d2 = c(40L, 50L, 20L), 
    d3 = c(20L, 40L, 50L), d4 = c(60L, 30L, 30L), p1 = c(1L, 
    3L, 2L), p2 = c(3L, 4L, 3L), p3 = c(2L, 1L, 1L), p4 = c(5L, 
    5L, 4L)), class = "data.frame", row.names = c(NA, -3L))

answered Jul 12 '18 at 06:07

akrun

874,273
37
540
662

1

Interesting, so on a data.frame, `split.default()` splits df vertically, but `split.data.frame()` horiziontally? – s_baldur Jul 12 '18 at 08:42
4

It's intended for lists and vectors so it will treat the `data.frame` as a `list`, which translates into splitting horizontally, that's clever, discovery of the day :) – moodymudskipper Jul 12 '18 at 10:45

score 3 · Answer 3 · answered Jul 12 '18 at 04:16

3

In base R you could use grep

ss <- c("d", "p")
lapply(setNames(ss, ss), function(x) df[, grep(x, colnames(df))])
#$d
#  d1 d2 d3 d4
#1 30 40 20 60
#2 20 50 40 30
#3 40 20 50 30
#
#$p
#  p1 p2 p3 p4
#1  1  3  2  5
#2  3  4  1  5
#3  2  3  1  4

Sample data

df <- read.table(text =
    "d1 d2 d3 d4 p1 p2 p3 p4
30 40 20 60 1  3  2  5
20 50 40 30 3  4  1  5
40 20 50 30 2  3  1  4", header = T)

answered Jul 12 '18 at 04:16

Maurits Evers

49,617
4
47
68

1

you don't really need the setNames, but i doesn't hurt – Bertil Baron Jul 12 '18 at 07:14
Some prefer names() instead of colnames() - shorter and slightly faster since colnames() calls names() in the end. – s_baldur Jul 12 '18 at 08:44
@BertilBaron Fair enough; however without `setNames` the resulting `list` will be unnamed; OPs expected output `list` is named. – Maurits Evers Jul 12 '18 at 13:32

score 2 · Answer 4 · answered Jul 12 '18 at 04:16

Here is one approach using tidyverse.

library(tidyverse)
df %>% gather(ind, values) %>%
  split(., gsub("[0-9]", "", df_td$ind)) %>%
  map(function(x) {
    x %>% 
      group_by(ind) %>% 
      mutate(id = row_number()) %>% 
      spread(ind, values) %>% 
      select(-1)})

# $d
# # A tibble: 3 x 4
#      d1    d2    d3    d4
#   <int> <int> <int> <int>
# 1    30    40    20    60
# 2    20    50    40    30
# 3    40    20    50    30

# $p
# # A tibble: 3 x 4
#      p1    p2    p3    p4
#   <int> <int> <int> <int>
# 1     1     3     2     5
# 2     3     4     1     5
# 3     2     3     1     4

Data

df <- structure(list(d1 = c(30L, 20L, 40L), d2 = c(40L, 50L, 20L), 
    d3 = c(20L, 40L, 50L), d4 = c(60L, 30L, 30L), p1 = c(1L, 
    3L, 2L), p2 = c(3L, 4L, 3L), p3 = c(2L, 1L, 1L), p4 = c(5L, 
    5L, 4L)), class = "data.frame", row.names = c(NA, -3L))

swiftg · Answer 5 · 2018-07-12T05:24:35.810

0

With indices, this should do it:

d = df[,c(1:4)]
p = df[,c(5:8)]

With names, extend the same concept:

dindices = grep("^d", colnames(df))
pindices = grep("^p", colnames(df))
d = df[,dindices]
p = df[,pindices]

edited Jul 12 '18 at 05:24

answered Jul 12 '18 at 05:09

swiftg

346
1
9

score 0 · Answer 6 · answered Apr 14 '21 at 08:09

0

You can use select from the library dplyr for create two dataframes from your source dataframe:

d<-select(dfsource, d1, d2, d3, d4)
p<-select(dfsource, p1, p2, p3, p4)

I hope this helps!! For me it's ok!

answered Apr 14 '21 at 08:09

La Rosalia

18
4

How to split data frame by column names in R?

6 Answers6

data

Sample data

Data

Linked

Related