1

I would like to add quotation marks to hundreds of words (surrounded by blank spaces) and separate them by a comma. Preferably, by using only one or a few functions to avoid a manual replacement due to temporal expenditure.

Exemplary Problem:

words <- c("Paris Milan Berlin")

Output should be: "Paris","Milan","Berlin"

I've already tried gsub() and str_extract(), however I did not get the desirable outcome.

zx8754
  • 52,746
  • 12
  • 114
  • 209
Maria26
  • 65
  • 4
  • 3
    Hi, now that I read the answers, I think it is not clear whether you would like to have a character **vector** (with three elements in this case) or a single scalar string with the text `"Paris","Milan","Berlin"` – Valeri Voev Dec 02 '19 at 12:38

4 Answers4

3

You can use gsub to place " around words and place , between them.

x <- gsub("[[:blank:]]+", ",", gsub('(\\b[[:alnum:]]+\\b)', '"\\1"', words))
x
#[1] "\"Paris\",\"Milan\",\"Berlin\""

noquote(x)
#"Paris","Milan","Berlin"

Or event shorter as suggested in the comments by @zx8754

paste0('"', gsub(' ', '","',  words), '"')
GKi
  • 37,245
  • 2
  • 26
  • 48
  • 2
    Maybe? `gsub(' ','","', words, fixed = TRUE)` – zx8754 Dec 02 '19 at 12:34
  • @zx8754 Oh yes, that's much shorter! Thanks. But the leading and trailing `"` is missing. I add them with `paste`. – GKi Dec 02 '19 at 13:06
  • Thank you a lot. That works but how can I avoid these back slashes in the outcome? – Maria26 Dec 02 '19 at 13:09
  • @Maria26 The backslashes are just how R displays a quote. You can use `noquote`, `cat` or `writeLines` to to display the string without the backslash escapes. See https://stackoverflow.com/q/45362944/10488504 – GKi Dec 02 '19 at 13:52
2

With base R (given that you always want to split on a space) you can do as simple as

unlist(strsplit(words, split = " "))
Valeri Voev
  • 1,982
  • 9
  • 25
1

There might be several ways. This is one of them:

library(tokenizers)
words <- c("Paris Milan Berlin")
tokenize_words(words, simplify =TRUE)
> tokenize_words(words, simplify =TRUE)
[1] "paris"  "milan"  "berlin"
Zhiqiang Wang
  • 6,206
  • 2
  • 13
  • 27
1

In addition to the method of using strsplit (mentioned by @Valeri Voev), another way is to use regmatches() and gregexpr, i.e.,

regmatches(words,gregexpr("[[:alnum:]]+",words))[[1]]

which gives

> regmatches(words,gregexpr("[[:alnum:]]+",words))[[1]]
[1] "Paris"  "Milan"  "Berlin"

To make a string output, the complete code is shown as below:

words <- c("Paris Milan Berlin")

r <- regmatches(words,gregexpr("[[:alnum:]]+",words))[[1]]

res1 <- toString(sapply(r, function(v) paste0('"',v,'"')))
# > res1
# [1] "\"Paris\", \"Milan\", \"Berlin\""

res2 <- toString(sapply(r, function(v) paste0("'",v,"'")))
# > res2
# [1] "'Paris', 'Milan', 'Berlin'"
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81