1

This is an extension of the following questions: (1), (2) and also asked in the comments to (2) by Mario Reutter.

library(dplyr, tidyverse)
string <- c("car", "train", 'bike', 'plain')
speed1 <- runif(4, min = 0, max = 10000)
speed2 <- runif(4, min = 0, max = 10000)
n1  <- sample(1:100, 4)
n1_plus  <- sample(1:100, 4)
n1_minus <- sample(1:100, 4)
n2  <- sample(1:100, 4)
df <- data.frame(string, speed1, speed2, n1, n2, n1_plus, n1_minus)

Thanks to akrun's answer I can build the following function:

my_fun <- function(dataf, V1, V2){
dataf %>%
dplyr::mutate("{{V1}}_{{V2}}" := paste0(format({{V1}}, big.mark   = ",") ,
  '\n(' , format({{V2}}, big.mark   = ",") , ')'))}

df<-df%>%my_fun(speed1, n1)

to create a new variable with a composite name as defined by"{{V1}}_{{V2}}" :=.

However, how were I to call a composite variable name on the right hand side of the equation? E.g. substituting format({{V2}}, big.mark = ",") with something like format('{{V2}}_plus', big.mark = ","). I tried (not working):

my_fun <- function(dataf, V1, V2){
dataf %>%
dplyr::mutate("{{V1}}_{{V2}}_plus" := paste0(format({{V1}}, big.mark   = ",") ,
  '\n(' , format('{{V2}}_plus', big.mark   = ",") , ')'))}

df<-df%>%my_fun(speed1, n1)

Desired output: I would expect a new column speed1_n1_plus that combines the values from speed1and n1_plus:

  string   speed1   speed2 n1 n2 n1_plus n1_minus       speed1_n1_plus
1    car 3958.415 1049.172 70 91      25       53 3,958.415\n(25)
2  train 6203.919 8639.160 52 92      14       91 6,203.919\n(14)
3   bike 2966.391 2997.303 35 55      46       61 2,966.391\n(46)
4  plain 2755.266 1627.379 98 66       8       49 2,755.266\n( 8)

I simply have to do operations on multiple variables with similar names. The variable names are composites of the 'core' name (in this case 'n1', {{V2}}) and suffixes and prefixes. I would like to avoid additional arguments for each variable name as it adds just a suffix to the core name.

I was trying: !!paste0, as.name(), eval(parse(text=), ..., which may work outside a function, but for me not within.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
MsGISRocker
  • 588
  • 4
  • 21
  • Do you have the `_plus` already created in the data – akrun Aug 31 '21 at 18:46
  • The ````_plust```` is an example of a suffix that I created in my data before. It could be ````_SD````, ````_skew````, ... Why? I would not like to change the general structure of all I did before, but only in case I have to. – MsGISRocker Aug 31 '21 at 18:49
  • i.e. something like `df$speed1_SD <- 100000` – akrun Aug 31 '21 at 18:50
  • Well, it's a vector for numeric values; basically a column in the data frame. – MsGISRocker Aug 31 '21 at 18:51
  • What I meant is that you have a vector object named `n1_plus` in the global env, which is not part of the data. Do you want that object to be created as column (as it was not part of the 'df' – akrun Aug 31 '21 at 19:54

2 Answers2

1
my_fun <- function(dataf, V1, V2){
           dataf %>%
              dplyr::mutate("{{V1}}_{{V2}}_plus" := paste0(format({{V1}}, big.mark   = ","),
                  "\n(", format(!! rlang::sym(paste0(rlang::as_string(ensym(V2)), "_plus")), big.mark  = ","), ")"))}

-testing

df %>%
  my_fun(speed1, n1)
 string   speed1    speed2 n1 n2 n1_plus n1_minus  speed1_n1_plus
1    car 4453.441 3336.7287 92 97      28       56 4,453.441\n(28)
2  train 7718.381  638.5120 82 61       9       13 7,718.381\n( 9)
3   bike 4648.093 4267.8390  7 92      83       29 4,648.093\n(83)
4  plain 3815.145  793.6886 18 56      30       46 3,815.145\n(30)
MsGISRocker
  • 588
  • 4
  • 21
akrun
  • 874,273
  • 37
  • 540
  • 662
  • @MsGISRocker the second option would be more dynamic – akrun Aug 31 '21 at 19:16
  • @MsGISRocker in your update, I see `n1_minus` as well which was not created in the orignal data. Is that also part of your expected – akrun Aug 31 '21 at 19:51
  • ````n1_plus````, ````n1_minus````, ... are examples for variables that extend the variable name of ````n1```` by a suffix. And that are to be called in a function by extending the variable name ````n1```` by the suffix. – MsGISRocker Aug 31 '21 at 19:57
  • @MsGISRocker Yes, but those objects I find as created in your globalenv as vectors and not as part of 'df'. So, I expect the function to pick those objects from the globalenv and create as columns. Updated the post if that is what you meant – akrun Aug 31 '21 at 19:58
  • @arkun: Well, I want use these variables: ````n1_plus````, ````n1_minus````, ... to do opperations within a function, by calling them based on ````n1```` and the suffix, but without new argument within the functin. – MsGISRocker Aug 31 '21 at 20:12
  • @MsGISRocker anyway the update works as per what you showed in the post (unless you do more changes) – akrun Aug 31 '21 at 20:19
  • @arkun: Thanks a lot! It works! Not the quick and easy solution as I was hoping for as it uses a lot of code and functions to do basically the same as ````"{{V1}}_{{V2}}_plus" :=```` does on the left hand side of the equation. Maybe there is no easy option. It may take me a while to apply it to my problem with multiple variables and suffixes, .... – MsGISRocker Aug 31 '21 at 21:31
  • @MsGISRocker the issue is that the `n1_plus` is not already present in the dataset. So, it needs to be `get` from the global env, then it had to do with formatting on another column – akrun Aug 31 '21 at 21:32
  • I guess I can simply by dropping ````new_col <- glue::glue("{rlang::ensym(V1)}_{pat}plus")```` and substitute ````mutate(!!new_col := ```` by ````mutate("{{V1}}_{{V2}}_plus":=```` – MsGISRocker Aug 31 '21 at 21:34
  • @MsGISRocker yes, if you don't need that column, you can drop it, but still the object is in global env, it needs to be picked up – akrun Aug 31 '21 at 21:35
  • @MsGISRocker i updated with a slightly more compact code – akrun Aug 31 '21 at 21:41
  • 1
    @arkun: I'll check tomorrow as it is kind of bed time for me now. – MsGISRocker Aug 31 '21 at 22:02
  • @arkun: I wondering, if u might have misunderstood the data. All data I use is part of the data frame. Actually, I was trying something like format(.data[[glue::glue("{{col}}_plus")]], but nothing jet. – MsGISRocker Aug 31 '21 at 22:04
  • @MsGISRocker In the example you created, `n1_plus` was not part of the ddata while you created `data.frame(..` – akrun Aug 31 '21 at 22:05
  • @MsGISRocker i.e. your call in creating is `df <- data.frame(string, speed1, speed2, n1, n2)` and where `n1_plus` was not included. If it was included, then you don't need the `.GlobalEnv` and other stuff – akrun Aug 31 '21 at 22:06
0

I agree that it would be helpful to use variable names on the right hand side of an assignment within mutate. The reason this is not implemented, is because you can perform this more efficiently by formating your data_frame appropriately in a longer format.

To me, it seems like speed1 & n1 and speed2 & n2 belong together in pairs. Therefore, you could transform your df from containing 4 rows (one for every vehicle type, i.e., car, train, etc.) to 8 rows (one for every vehicle instance, i.e., car1, car2, etc.).

In your example, it would be easier to construct the data_frame already in this longer format but since you may have to use a database in the format you specified, let's reformat (note: this is very tedious because some of the information is stored in the variable names and needs to be transformed back into individual cells):

df_long = df %>% pivot_longer(-string) %>% #expand on everything but the column "string" (super long format but we need this to grab the information from the column names)
  mutate(number = gsub("\\D+", "", name), name = gsub("\\d+", "", name)) #separate the numbers from the variable names

#separate speed and everything starting with "n" and get them into a wider format
df_n = df_long %>% filter(grepl("^n", name)) %>% pivot_wider(names_from=name)
df_rest = df_long %>% filter(grepl("^n", name)==F) %>% pivot_wider(names_from=name)

df_tidy = full_join(df_rest, df_n) #join the data frames together
View(df_tidy) #take a look how the df looks differently now (including explicit NAs since n2_plus and n2_minus don't exist in your example)

Now you can simply do this to get the result you wanted:

df_tidy = df_tidy %>% mutate(result = paste0(format(speed, big.mark=","), "\n(", format(n_plus, big.mark=","), ")"))

Note: It may make sense to choose an even longer format such that n, n_plus, and n_minus are not different columns but are coded in another column n_kind with factor levels "standard", "plus", and "minus". But I cannot judge from your example.

Mario Reutter
  • 249
  • 2
  • 10