0

I want to Merge certain raw values of data frame into one data frame.

If raw data as shown below,

> set.seed(1)
> Data1 <- data.frame(
+     Value = sample(1:5),
+     Report = sample(c("yes", "no"), 5, replace = TRUE)
+ )
> Data2 <- data.frame(
+     Value = sample(1:5),
+     Report = sample(c("yes", "no"), 5, replace = TRUE)
+ )
> Data3 <- data.frame(
+     Value = sample(1:5),
+     Report = sample(c("yes", "no"), 5, replace = TRUE)
+ )
> 
> subset(Data1)
  Value Report
1     2     no
2     5     no
3     4     no
4     3     no
5     1    yes
> subset(Data2)
  Value Report
1     2    yes
2     1     no
3     3     no
4     4    yes
5     5     no
> subset(Data3)
  Value Report
1     5    yes
2     1    yes
3     2    yes
4     4     no
5     3    yes

I want to make the data frame as follows.

collect rows with 'Yes' in the 'Report' column and save them to each data frame.

dfResult1
Value  Report
1      yes
2      yes

dfResult2
value  report
4      Yes 

dfResult3
value  report
5      Yes
1      Yes
2      Yes

dfResult4
value  report
3      Yes
  • 3
    it would be a lot easier to help if the sample data was better formatted/easier to read. have a look [here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) for some guidance about how to do so :) – MichaelChirico Aug 12 '19 at 02:30
  • Also noteworthy: https://stackoverflow.com/editing-help#syntax-highlighting – r2evans Aug 12 '19 at 02:47
  • 1
    Can you explain the logic of how do you create `dfResult1`, `dfResult2` from `df1`, `df2` and `df3` ? – Ronak Shah Aug 12 '19 at 02:48
  • I want to collect the rows with the value of 'yes' for report and make each data frame. – Hyukki Kwon Aug 12 '19 at 03:28
  • 1
    Do you need `df <- rbind(df1, df2, df3)` and then using `dplyr`, `df %>% filter(Report == "yes") %>% group_split(Value)` ? – Ronak Shah Aug 12 '19 at 03:57
  • I've revised the text. Could you help me again? – Hyukki Kwon Aug 12 '19 at 05:04

1 Answers1

0

Here is one way to do it in base R, rbind all the dataframes together, create a row number column, filter the rows with only "yes" values and create a grouping column to split.

df <- rbind(Data1, Data2, Data3)
df$row <- seq_len(nrow(df))
df <- df[df$Report == "yes",]
list_df <- split(df[-3], cumsum(c(TRUE, diff(df$row) > 1)))
list_df

#$`1`
#  Value Report
#5     1    yes
#6     2    yes

#$`2`
#  Value Report
#9     4    yes

#$`3`
#   Value Report
#11     5    yes
#12     1    yes
#13     2    yes

#$`4`
#   Value Report
#15     3    yes

It will return a list of dataframes which can be accessed by using list_df[[1]], list_df[[2]] and so on.

Using same logic, we can use dplyr

library(dplyr)
df %>%
  mutate(row = row_number()) %>%
  filter(Report == "yes") %>%
  group_split(cumsum(c(TRUE, diff(row) > 1)), keep = FALSE)

We can also use rle

df$group <- with(rle(df$Report), rep(seq_along(values), lengths))
df1 <- subset(df, Report == "yes")
split(df1, df1$group)

and using data.table::rleid

df %>%
  mutate(group = data.table::rleid(Report)) %>%
  filter(Report == "yes") %>%
  group_split(group, keep = FALSE)
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213