1

I have a dataset that looks like this:

enter image description here

you can get sample data using code:

df <- structure(list(Var = c("v1", "v2", "v3", "v4", "v5", "v6", "v7", 
"v8", "v9", "v10"), Domain = c("Group1", "Group1", "Group1", 
"Group4", "Group4", "Group4", "Group2", "Group2", "Group3", "Group3"
), Groupvar = c("v1", NA, NA, "v5", NA, NA, NA, NA, "v10", NA
)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"
))

I would like to build a new variable "Action" to define the action I want to do further using the rule: If there is a !is.na groupvar within same domian, the action=="Merge", if no !is.na groupvar for that domain, then Action=="Append". The dinal data should looks like this:

enter image description here

what should I do? I think I should use lapply( split(df, df$Domain) , ..., I am new to R and still get confused with list and data.Frame. Not sure what I should do after I split the data by domain. What I want to do is sth like egen. Any guidance on what I should do in order to build variable "Action" will be high appreciated. Thanks.

Stataq
  • 2,237
  • 6
  • 14

3 Answers3

3

Try this option. The function ifelse() is vectorized and can wrap any other function as logical tests in order to avoid the addition of functions that work at other levels:

library(dplyr)
#Code
newdf <- df %>% group_by(Domain) %>%
  mutate(Action=ifelse(sum(!is.na(Groupvar))>=1,'Merge','Append'))

Output:

# A tibble: 10 x 4
# Groups:   Domain [4]
   Var   Domain Groupvar Action
   <chr> <chr>  <chr>    <chr> 
 1 v1    Group1 v1       Merge 
 2 v2    Group1 NA       Merge 
 3 v3    Group1 NA       Merge 
 4 v4    Group4 v5       Merge 
 5 v5    Group4 NA       Merge 
 6 v6    Group4 NA       Merge 
 7 v7    Group2 NA       Append
 8 v8    Group2 NA       Append
 9 v9    Group3 v10      Merge 
10 v10   Group3 NA       Merge 
Duck
  • 39,058
  • 13
  • 42
  • 84
  • Never know that we can do `sum(!is.na(Groupvar))>=1`. This one is more like rsum and egen in stata. :) – Stataq Nov 22 '20 at 23:40
2

An option is to do a group by 'Domain' and create the 'Action' with if/else i.e. if all the values in 'Groupvar' are NA then return 'Append' or else 'Merge'

library(dplyr)
df %>%
    group_by(Domain) %>%
    mutate(Action = if(all(is.na(Groupvar))) 'Append' else 'Merge') %>%
    ungroup

-output

# A tibble: 10 x 4
#   Var   Domain Groupvar Action
#   <chr> <chr>  <chr>    <chr> 
# 1 v1    Group1 v1       Merge 
# 2 v2    Group1 <NA>     Merge 
# 3 v3    Group1 <NA>     Merge 
# 4 v4    Group4 v5       Merge 
# 5 v5    Group4 <NA>     Merge 
# 6 v6    Group4 <NA>     Merge 
# 7 v7    Group2 <NA>     Append
# 8 v8    Group2 <NA>     Append
# 9 v9    Group3 v10      Merge 
#10 v10   Group3 <NA>     Merge 

Or can also be written as

df %>%
     mutate(Action = c("Append", "Merge")[1 + 
             (Domain %in% Domain[complete.cases(Groupvar)])])

-output

# A tibble: 10 x 4
#   Var   Domain Groupvar Action
#   <chr> <chr>  <chr>    <chr> 
# 1 v1    Group1 v1       Merge 
# 2 v2    Group1 <NA>     Merge 
# 3 v3    Group1 <NA>     Merge 
# 4 v4    Group4 v5       Merge 
# 5 v5    Group4 <NA>     Merge 
# 6 v6    Group4 <NA>     Merge 
# 7 v7    Group2 <NA>     Append
# 8 v8    Group2 <NA>     Append
# 9 v9    Group3 v10      Merge 
#10 v10   Group3 <NA>     Merge 

If we want to use split, then an option is

df$Action <- unsplit(lapply(split(df$Groupvar, df$Domain), 
function(x) if(all(is.na(x))) rep('Append', length(x)) else
      rep('Merge', length(x))), df$Domain)
df$Action
#[1] "Merge"  "Merge"  "Merge"  "Merge"  "Merge"  "Merge"  "Append" "Append" "Merge"  "Merge" 

Or a more compact option in base R is

c("Append", "Merge")[1 + (df$Domain %in% na.omit(df)$Domain)]
#[1] "Merge"  "Merge"  "Merge"  "Merge"  "Merge"  "Merge"  "Append" "Append" "Merge"  "Merge" 
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    You are right. I work too much with split these days and forget about `group_by`. Thanks so much for the prompt reply. :) – Stataq Nov 22 '20 at 23:05
  • @Stataq Thanks. I updated with `split`. Hope it helps you – akrun Nov 22 '20 at 23:07
  • Thanks so much! It did not allowed me to accept answer within 12 mins. I now accepted. we are working on a project to speed up merge/append process. I just fixed the sample e6. Could you give me some idea on that? The post is https://stackoverflow.com/questions/64914448/how-to-merge-a-list-of-file-that-are-in-the-environment-following-a-mapping-file. If it is not doable, i am fully understand. :) – Stataq Nov 22 '20 at 23:20
0

No need to split the data you can do this in base R with ave :

df$Action <- with(df, ifelse(ave(!is.na(Groupvar), Domain, FUN = any), 
                      'Merge', 'Append'))
df

# A tibble: 10 x 4
#   Var   Domain Groupvar Action
#   <chr> <chr>  <chr>    <chr> 
# 1 v1    Group1 v1       Merge 
# 2 v2    Group1 NA       Merge 
# 3 v3    Group1 NA       Merge 
# 4 v4    Group4 v5       Merge 
# 5 v5    Group4 NA       Merge 
# 6 v6    Group4 NA       Merge 
# 7 v7    Group2 NA       Append
# 8 v8    Group2 NA       Append
# 9 v9    Group3 v10      Merge 
#10 v10   Group3 NA       Merge 
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213