split dataframe in R by row

Question

I have a long dataframe like this:

  Row  Conc   group
  1     2.5    A
  2     3.0    A
  3     4.6    B
  4     5.0    B
  5     3.2    C
  6     4.2    C
  7     5.3    D
  8     3.4    D

...

The actual data have hundreds of row. I would like to split A to C, and D. I looked up the web and found several solutions but not applicable to my case.

How to split a data frame?

For example: Case 1:

x = data.frame(num = 1:26, let = letters, LET = LETTERS)
set.seed(10)
split(x, sample(rep(1:2, 13)))

I don't want to split by arbitrary number

Case 2: Split by level/factor

data2 <- data[data$sum_points == 2500, ]

I don't want to split by a single factor either. Sometimes I want to combine many levels together.

Case 3: select by row number

newdf <- mydf[1:3,]

The actual data have hundreds of rows. I don't know the row number. I just know the level I would like to split at.

score 10 · Accepted Answer · answered Oct 29 '12 at 16:41

10

It sounds like you want two data frames, where one has (A,B,C) in it and one has just D. In that case you could do

Data1 <- subset(Data, group %in% c("A","B","C"))
Data2 <- subset(Data, group=="D")

Correct me if you were asking something different

answered Oct 29 '12 at 16:41

Señor O

17,049
2
45
47

12

I think maybe `split(dat,dat$group == 'D')` is sufficient. – joran Oct 29 '12 at 16:47
I think maybe it's the same thing. – Señor O Oct 29 '12 at 17:08
4

It (sort of) achieves the same result, but is more idiomatic, only takes one line, and conveniently returns both pieces in a single data structure. In general, one should prefer using `split`. – joran Oct 29 '12 at 17:29
It returns a list of two data frames which have to be called using ` data$`FALSE` ` instead of calling data frame elements directly. It's further complicated if you want to create more than two splits. So it depends on what you're doing. – Señor O Oct 29 '12 at 18:09

score 9 · Answer 2 · answered Mar 11 '19 at 09:43

9

For those who end up here through internet search engines time after time, the answer to the question in the title is:

x <- data.frame(num = 1:26, let = letters, LET = LETTERS)

split(x, sort(as.numeric(rownames(x))))

Assuming that your data frame has numerically ordered row names. Also split(x, rownames(x)) works, but the result is rearranged.

answered Mar 11 '19 at 09:43

Mikko

7,530
8
55
92

1

How would one go about storing each of these though? Like if I want a data frame that is `A` then `B` and so on. – Hercislife Dec 21 '20 at 16:34
@Hercislife Use `lapply()` https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/lapply – Mikko Jan 04 '21 at 07:56

score 0 · Answer 3 · answered Oct 29 '12 at 17:17

You may consider using the recode() function from the "car" package.

# Load the library and make up some sample data
library(car)
set.seed(1)
dat <- data.frame(Row = 1:100,
                  Conc = runif(100, 0, 10),
                  group = sample(LETTERS[1:10], 100, replace = TRUE))

Currently, dat$group contains the upper case letters A to J. Imagine we wanted the following four groups:

"one" = A, B, C
"two" = D, E, J
"three" = F, I
"four" = G, H

Now, use recode() (note the semicolon and the nested quotes).

recodes <- recode(dat$group, 
                 'c("A", "B", "C") = "one"; 
                  c("D", "E", "J") = "two"; 
                  c("F", "I") = "three"; 
                  c("G", "H") = "four"')
split(dat, recodes)

You don't really need the car package for this if you are comfortable working with the levels directly it only save a marginal amount of typing from `levels(dat$group)[levels(dat$group) %in% c("A","B","C")] <- "one"` for example. — Brandon Bertelsen, Oct 29 '12 at 19:17

score 0 · Answer 4 · answered Mar 10 '22 at 17:41

With base R, we can input the factor that we want to split on.

split(df, df$group == "D")

Output

$`FALSE`
  Row Conc group
1   1  2.5     A
2   2  3.0     A
3   3  4.6     B
4   4  5.0     B
5   5  3.2     C
6   6  4.2     C

$`TRUE`
  Row Conc group
7   7  5.3     D
8   8  3.4     D

If you wanted to split on multiple letters, then we could:

split(df, df$group %in% c("A", "D"))

Another option is to use group_split from dplyr, but will need to make a grouping variable first for the split.

library(dplyr)

df %>% 
  mutate(spl = ifelse(group == "D", 1, 0)) %>% 
  group_split(spl, .keep = FALSE)

split dataframe in R by row

4 Answers4

Linked