3

I have a character vector in which long names are used, which will consist of several words connected by delimiters in the form of a dot.

x <- c("Duschekia.fruticosa..Rupr...Pouzar",
       "Betula.nana.L.",
       "Salix.glauca.L.",
       "Salix.jenisseensis..F..Schmidt..Flod.",
       "Vaccinium.minus..Lodd...Worosch")

The length of the names is different. But only the first two words of the entire name are important.

My goal is to get names up to 7 symbols: 3 initial symbols from the first two words and a separator in the form of a "dot" between them.

Very close to my request are these examples, but I do not know how to apply these code variations to my case. R How to remove characters from long column names in a data frame and how to append names to " column names" of the output data frame in R?

What should I do to get exit names to look like this?

x <- c("Dus.fru",
       "Bet.nan",
       "Sal.gla",
       "Sal.jen",
       "Vac.min")

Any help would be appreciated.

Jaap
  • 81,064
  • 34
  • 182
  • 193
Denis Efimov
  • 115
  • 1
  • 6

3 Answers3

8

You can do the following:

gsub("(\\w{1,3})[^\\.]*\\.(\\w{1,3}).*", "\\1.\\2", x)
# [1] "Dus.fru" "Bet.nan" "Sal.gla" "Sal.jen" "Vac.min"

First we match up to 3 characters (\\w{1,3}), then ignore anything which is not a dot [^\\.]*, match a dot \\. and then again up to 3 characters (\\w{1,3}). Finally anything, that comes after that .*. We then only use the things in the brackets and separate them with a dot \\1.\\2.

kath
  • 7,624
  • 17
  • 32
3

Split on dot, substring 3 characters, then paste back together:

sapply(strsplit(x, ".", fixed = TRUE), function(i){
  paste(substr(i[ 1 ], 1, 3), substr(i[ 2], 1, 3), sep = ".")
})
# [1] "Dus.fru" "Bet.nan" "Sal.gla" "Sal.jen" "Vac.min"
zx8754
  • 52,746
  • 12
  • 114
  • 209
1

Here a less elegant solution than kath's, but a bit more easy to read, if you are not an expert in regex.

# Your data
x <- c("Duschekia.fruticosa..Rupr...Pouzar",
       "Betula.nana.L.",
       "Salix.glauca.L.",
       "Salix.jenisseensis..F..Schmidt..Flod.",
       "Vaccinium.minus..Lodd...Worosch")

# A function that takes three characters from first two words and merges them    
cleaner_fun <- function(ugly_string) {
  words <- strsplit(ugly_string, "\\.")[[1]]
  short_words <- substr(words, 1, 3)
  new_name <- paste(short_words[1:2], collapse = ".")
  return(new_name)
}

# Testing function
sapply(x, cleaner_fun)
[1]"Dus.fru" "Bet.nan" "Sal.gla" "Sal.jen" "Vac.min"
Esben Eickhardt
  • 3,183
  • 2
  • 35
  • 56