2

This is the column that I would like to modify:

"00640+6.2.1.1; 00680+6.2.1.1; 00720+6.2.1.1;"

Desired output:

00640; 00680; 00720

My idea was to replace "+" with dot, then eliminate all numbers containing dots, is if it was decimals, but it is eliminating everything else apart from the 1st 00640, how to modify it?

tmp <- as.character(tmp)
tmp <- unlist(lapply(strsplit(tmp, split = "\\+"), FUN = paste, collapse = "."))
tmp <- gsub("\\..*", "", tmp)
user3224522
  • 1,119
  • 8
  • 19

3 Answers3

4

In your example data it looks like we can just remove everything after the plus sign. If that's the case,

tmp <- gsub("\\+.*", "", tmp)

If that's not the case, please provide some more data so we can find a more appropriate solution. Also, these are vectors you're working with, not one string, correct? That appears to be the case but it's unclear from your post. You should read up on how to provide a more complete reprex.

cparmstrong
  • 799
  • 6
  • 23
  • 3
    You are almost right, your regex is simpler tahn mine but `+` is a special character and needs to be escaped, `gsub("\\+.*", "", x)`. Anyway, upvote. – Rui Barradas Mar 29 '18 at 16:28
  • but it will then remove everything and leave me only 00640 – user3224522 Mar 29 '18 at 16:32
  • @user3224522 What's temp value have you used? We have considered it as vector. – MKR Mar 29 '18 at 16:36
  • it is not a vector, otherwise I wouldn't have to do all these tricky operations above... I am working on a column of the data.frame – user3224522 Mar 29 '18 at 16:45
  • `gsub("+.*", "", tmp)` doesnt seem to be working for me. Please check your answer. – MKR Mar 29 '18 at 16:46
  • A column of a data frame is a vector @user3224522. This is why we need a better reprex. In your example data in the original post all you wanted was the first five digits. – cparmstrong Mar 29 '18 at 16:46
  • @seeellayewhy "00640+6.2.1.1; 00680+6.2.1.1; 00720+6.2.1.1;" this string is in one single column, and all of the answers give me 00640 as an output. – user3224522 Mar 29 '18 at 16:53
  • What you are describing is a 1x1 data frame (which is effectively a vector of length one, which is also effectively a scalar) that contains the string. Is that correct? It's not a column but a value, if that's the case. If it's the case then @RuiBarradas answer would be the one for you. Mine is for a vector (or column of a data frame) which is not actually what you have. – cparmstrong Mar 29 '18 at 17:00
2

There is no need to strsplit, sub/paste alone will do the job.

x <- scan(what = character(), 
          text = "00640+6.2.1.1; 00680+6.2.1.1; 00720+6.2.1.1",
          sep = ";")
x <- trimws(x)

y <- sub("^([[:digit:]]+).*$", "\\1", x)
y
#[1] "00640" "00680" "00720"

paste(y, collapse = "; ")
#[1] "00640; 00680; 00720"

Explanation.

  1. ^: beginning of string.
  2. ^([[:digit:]]+): at least a digit at the beginning of the string, the parenthesis make of it a group, the first one.
  3. .*$: any characters until the end ($).
  4. \\1: in the replacement, matches the group mentioned above.
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
  • thank you...so if my data is a column with 100 length, how can I modify text part? replace the string with tmp? – user3224522 Mar 29 '18 at 17:06
  • @user3224522 if you mean to replace the string in the `scan` instruction by `tmp`, yes. Give it a try then say something. – Rui Barradas Mar 29 '18 at 19:30
2

One more option could be using look-forward operator as:

v <- c("00640+6.2.1.1", "00680+6.2.1.1", "00720+6.2.1.1")
gsub("^(\\d+)(?=\\+).*","\\1", v, perl = TRUE)
#[1] "00640" "00680" "00720"

Regex Explanation

  1. ^: beginning of string
  2. (\\d+): Any number of continuous digits. () to make it 1st group
  3. (?=\\+): followed by +
  4. .*: Anything afterwards
MKR
  • 19,739
  • 4
  • 23
  • 33