1

I have a data frame as:

   Line    ID                    Name Filter LineFilter                                        Value
1     1 00100             Liquid Milk                                                     E0 9,10,11
2     2 00200            Vinamilk Co.      F          1            E0 9,10,11 E E11 7 O E0 1 E E11 7
3     3 00300       Total VNM Regular      F          2                              INCLUDETHEBELOW
4     4 00400                VNM Fino      F          3                        E0 10 E E1 200 E E5 2
5     5 00500                VNM A,D3      F          4           E0 10 E E1 200 E E3 25,26,27,28,35
6     6 00600           VNM ADM Total      F          3                                 Includebelow
7     7 00700                 VNM ADM      F          6        E0 10 E E1 200 E E3 12,13,14,15,16,17
8     8 00800            VNM ADM Gold      F          6              E0 10 E E1 200 E E3 29,30,31,34
9     9 00900         VNM ADM Gold IQ      F          6              E0 10 E E1 200 E E3 37,38,39,40
10   10 01000          Total VNM 100%      F          2                              INCLUDETHEBELOW
11   11 01100            VNM 100% UHT      F         10 E0 10 E E1 200 E E3 6,8,10,11,20,21,22,23,24
12   12 01200     VNM 100% Pasteurize      F         10                               E0 09 E E1 227
13   13 01300       VNM 100% Skim fat      F         10                    E0 10 E E1 200 E E3 18,19
structure(list(Line = 1:13, ID = c("00100", "00200", "00300", 
"00400", "00500", "00600", "00700", "00800", "00900", "01000", 
"01100", "01200", "01300"), Name = c(" Liquid Milk", "  Vinamilk Co.", 
"   Total VNM Regular", "    VNM Fino", "     VNM A,D3", "    VNM ADM Total", 
"     VNM ADM", "     VNM ADM Gold", "     VNM ADM Gold IQ", 
"   Total VNM 100%", "    VNM 100% UHT", "    VNM 100% Pasteurize", 
"    VNM 100% Skim fat"), Filter = c(" ", "F", "F", "F", "F", 
"F", "F", "F", "F", "F", "F", "F", "F"), LineFilter = c("    ", 
"1", "2", "3", "4", "3", "6", "6", "6", "2", "10", "10", "10"
), Value = c("E0 9,10,11", "E0 9,10,11 E E11 7 O E0 1 E E11 7", 
"INCLUDETHEBELOW", "E0 10 E E1 200 E E5 2", "E0 10 E E1 200 E E3 25,26,27,28,35", 
"Includebelow", "E0 10 E E1 200 E E3 12,13,14,15,16,17", "E0 10 E E1 200 E E3 29,30,31,34", 
"E0 10 E E1 200 E E3 37,38,39,40", "INCLUDETHEBELOW", "E0 10 E E1 200 E E3 6,8,10,11,20,21,22,23,24", 
"E0 09 E E1 227", "E0 10 E E1 200 E E3 18,19")), row.names = c(NA, 
-13L), class = c("tbl_df", "tbl", "data.frame"))

What I want to do with Value column is: if the entry is "Includethebelow", the Value will consist of those with the according LineFilter matching Line, meaning:

  • Value[3] = Value[4]
  • Value[6] = Value[7] + Value[8] + Value[9]
  • Value[10] = Value[11] + Value[12] + Value[13]

Currently, my code is:

library(stringr)
library(dplyr)
library(tidyverse) 

###
mutate(Value = case_when(
    str_detect(Value, regex("below", ignore_case = TRUE)) ~ paste(Value[LineFilter==Line], sep=""),
    TRUE ~ Value
  ))
###

I know that Value[LineFilter==Line] cannot work with this logic under the use of mutate. Then how can I implement this idea?

Thanks for your help.

Flora
  • 11
  • 2
  • 1
    Welcome to SO! Please don't show an image of your data put paste the output of `dput`, see [how to make a great MSE](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). Then it's easier to help you, thanks! – starja Feb 15 '22 at 19:12
  • I would make a helper column and use it to sum, like `df %>% mutate(group = cumsum(Value == "Includethebelow") %>% group_by(group) %>% mutate(value2 = if_else(row_number() == 1, [concatenate strings?], NA)` – Jon Spring Feb 15 '22 at 19:26
  • Note that `tidyverse` is a meta package that will load `dplyr`, `stringr` and some others packages too. No need to load them separately. – rdelrossi Feb 15 '22 at 19:59
  • @starja I have included the data instead of the picture @JonSpring ```mutate(group = cumsum(Value == "Includethebelow")``` would return value of 1 for all entries after the first "Includethebelow" – Flora Feb 16 '22 at 01:56
  • Greetings! Please share a reproducible dataset as shown here: https://youtu.be/3EID3P1oisg – Shawn Hemelstrand Feb 19 '22 at 13:57

1 Answers1

0

This will work:

# temporary dataset for joining
data_linefilter <- data %>%
  filter(!str_detect(toupper(Value), "BELOW")) %>%
  mutate(LineFilter = as.integer(LineFilter)) %>%
  group_by(LineFilter) %>%
  summarise(Value_helper_column = paste0(Value, collapse = ","), .groups = "drop")

# join and mutate
data %>%
  left_join(data_linefilter, by = c("Line" = "LineFilter")) %>%
  mutate(Value = if_else(str_detect(toupper(Value), "BELOW"), Value_helper_column, Value)) %>%
  select(-Value_helper_column)

It gives you the desired output, but you will need to create the temporary dataset for joining.

# A tibble: 13 x 6
    Line ID    Name                      Filter LineFilter Value                                                                                                
   <int> <chr> <chr>                     <chr>  <chr>      <chr>                                                                                                
 1     1 00100 " Liquid Milk"            " "    "    "     E0 9,10,11                                                                                           
 2     2 00200 "  Vinamilk Co."          "F"    "1"        E0 9,10,11 E E11 7 O E0 1 E E11 7                                                                    
 3     3 00300 "   Total VNM Regular"    "F"    "2"        E0 10 E E1 200 E E5 2                                                                                
 4     4 00400 "    VNM Fino"            "F"    "3"        E0 10 E E1 200 E E5 2                                                                                
 5     5 00500 "     VNM A,D3"           "F"    "4"        E0 10 E E1 200 E E3 25,26,27,28,35                                                                   
 6     6 00600 "    VNM ADM Total"       "F"    "3"        E0 10 E E1 200 E E3 12,13,14,15,16,17,E0 10 E E1 200 E E3 29,30,31,34,E0 10 E E1 200 E E3 37,38,39,40
 7     7 00700 "     VNM ADM"            "F"    "6"        E0 10 E E1 200 E E3 12,13,14,15,16,17                                                                
 8     8 00800 "     VNM ADM Gold"       "F"    "6"        E0 10 E E1 200 E E3 29,30,31,34                                                                      
 9     9 00900 "     VNM ADM Gold IQ"    "F"    "6"        E0 10 E E1 200 E E3 37,38,39,40                                                                      
10    10 01000 "   Total VNM 100%"       "F"    "2"        E0 10 E E1 200 E E3 6,8,10,11,20,21,22,23,24,E0 09 E E1 227,E0 10 E E1 200 E E3 18,19                
11    11 01100 "    VNM 100% UHT"        "F"    "10"       E0 10 E E1 200 E E3 6,8,10,11,20,21,22,23,24                                                         
12    12 01200 "    VNM 100% Pasteurize" "F"    "10"       E0 09 E E1 227                                                                                       
13    13 01300 "    VNM 100% Skim fat"   "F"    "10"       E0 10 E E1 200 E E3 18,19 
fschier
  • 180
  • 10
  • Thank you so much for the solution! But still I wonder whether it is possible to use mutate with a condition as explained in the question. [ It may be hard for me to interpret the problem clearly since coding is not my major :( ] – Flora Feb 23 '22 at 02:48
  • I dont see how this could work with only a mutate, since you would have to compare values in different rows. Maybe there is a way to incorporate that logic in a custom function and use it in _map(...)_. But in any case that would require quite some more code/time than this solution. – fschier Feb 23 '22 at 12:54