2

See the following reproducible example:

require(tidyverse)

set.seed(1)
reprex_df <- data.frame(
  var1 = sample(1:10),
  var2 = sample(11:20),
  var3 = sample(21:30)
)

I am trying to create a new column containing URLs created from concatenating the other variables from each row into a string with "https://www.google.com/search?q=", using the following code:

reprex_df %>% mutate(new_col = c(paste("https://www.google.com/search?q=", var1, var2, var3, sep="+")))

Which results in:

https://www.google.com/search?q=+3+13+30

The problem with this is that it puts a + between the https://www.google.com/search?q= and var1, which is not a valid format for the URL. I need no separator between these strings. Like so:

https://www.google.com/search?q=3+13+30

Can I somehow specify to use a different separator for this part of the conjunction using paste(), or do I have to take a totally different approach? Any ideas?

Braiam
  • 1
  • 11
  • 47
  • 78
Clayton Glasser
  • 153
  • 1
  • 11
  • 3
    Note, [don’t use `require`, use `library`](https://stackoverflow.com/a/51263513/1968). – Konrad Rudolph Sep 20 '18 at 21:39
  • For anyone wondering, this is because require() returns a logical, which is useful for loading it conditionally, or inside a function that needs to run even the library is not found. Library() is more appropriate for setting up the general environment. – Clayton Glasser Sep 20 '18 at 22:04
  • 2
    @KonradRudolph, I'll add to that (though significantly more *objective*): in questions, load packages you need, not a meta package that imports 25 other (sometimes large) packages. Be kind to your answerers, I'm typically looking at things within an already-running R session, and since I do not personally load all of those packages, I don't want to have them in the namespace (oh, the collisions!). It's python's equivalent of `from pkgname import *`, which is both discouraged and (again, objectively) sloppy for namespace management. (Yes ... it's just *my* opinion.) – r2evans Sep 20 '18 at 22:24
  • 2
    @r2evans Totally agree, and not just in questions but generally in code. The Tidyverse is amazing. The `tidyverse` (package) is bad. – Konrad Rudolph Sep 20 '18 at 22:40
  • I wonder if one could ever justify `library(*)` (in R) or `from * import *` (in python) to see the massive conflagration of collisions and other problems ... – r2evans Sep 20 '18 at 22:42

2 Answers2

6

You need another paste

reprex_df %>%
  mutate(new_col = paste0(
    "https://www.google.com/search?q=",
    paste(var1, var2, var3, sep = "+")
  ))
#   var1 var2 var3                                  new_col
#1     3   13   30  https://www.google.com/search?q=3+13+30
#2     4   12   22  https://www.google.com/search?q=4+12+22
#3     5   16   26  https://www.google.com/search?q=5+16+26
# ...

If you don't want to type all the variable names var1 to varn try purrr::invoke, thanks to @thelatemail

reprex_df %>%
  mutate(new_col = paste0("https://www.google.com/search?q=", 
                          invoke(paste, ., sep = "+")
                          )
         )

Or in base R

url <- "https://www.google.com/search?q=" # optional
transform(reprex_df,
          new_col = paste0(url, do.call(paste, c(reprex_df, sep = "+"))))
markus
  • 25,843
  • 5
  • 39
  • 58
  • 1
    Quite sensible. Thank you. – Clayton Glasser Sep 20 '18 at 21:34
  • 1
    And if you don't want to type out `var1-varn` you could use `do.call` or the `purrr` `invoke()` wrapper - `reprex_df %>% mutate(new_col = paste0( "https://www.google.com/search?q=", invoke(paste, ., sep="+") ))` – thelatemail Sep 20 '18 at 22:19
6
  1. paste0 Perhaps the easiest way is to specify the + signs as arguments with paste0 rather than using sep:

    root <- "https://www.google.com/search?q="
    reprex_df %>% 
      mutate(new_col = paste0(root, var1, "+", var2, "+", var3))
    
  2. sprintf sprintf is another possibility:

    fmt <- "https://www.google.com/search?q=%d+%d+%d"
    reprex_df %>%
      mutate(new_col = sprintf(fmt, var1, var2, var3))
    
  3. sub Yet another possibility is to use the code in the question but follow it with code to remove the first +:

    root <- "https://www.google.com/search?q="
    reprex_df %>% 
      mutate(new_col = paste(root, var1, var2, var3, sep="+"),
             new_col = sub("\\+", "", new_col))
    
  4. allow extra + Google ignores the + after the equal sign so another approach is to just allow the extra plus to exist.

    root <- "https://www.google.com/search?q="
    reprex_df %>% 
      mutate(new_col = paste(root, var1, var2, var3, sep="+"))
    
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341