6

I am trying to get rid of all the numbers/characters coming in AFTER the FIRST hyphen. here are some examples:

15-103025-01
800-40170-02
68-4974-01

My desired output:

15-
800-
68-

I've read through posts like these:

  1. Using gsub to extract character string before white space in R
  2. truncate string from a certain character in R
  3. Truncating the end of a string in R after a character that can be present zero or more times

But they are not what I'm looking for as the methods mentioned in those will get rid of my hyphen as well (leaving me only the first 2 or 3 numbers).

Here's what I've tried so far:

gsub(pattern = '[0-9]*-$', replacement = "", x = data$id)
grep(pattern = '[0-9]*-', replacement = "", x = data$id)
regexpr(pattern = '[0-9]*-', text = data$id)

but not really working as I expected.

Community
  • 1
  • 1
alwaysaskingquestions
  • 1,595
  • 5
  • 22
  • 49

3 Answers3

10

Several ways to achieve this, here is one:

have <- c("15-103025-01", "800-40170-02", "68-4974-01")
want <- sub(pattern = "(^\\d+\\-).*", replacement = "\\1", x = have)

So in your regular expression, you'll have one group created with ()'s, which matches the start of the string (^) followed by one or more numbers (\\d+) and the hyphen (\\-). Outside the group is any other character(s) that follow (.*).

In the replacement part, you specify \\1 to refer to the first (and only) group of the regular expression. Not adding anything else means dropping all the rest.

Dominic Comtois
  • 10,230
  • 1
  • 39
  • 61
4

Why not just,

sub('-.*', '-', x)
#[1] "15-"  "800-" "68-"

To do the same with second hyphen then,

sub('-([^-]*)$', '-', x)
#[1] "15-103025-" "800-40170-" "68-4974-"
Sotos
  • 51,121
  • 6
  • 32
  • 66
3

Alternative with stringr, supposedly name of vector is x

library(stringr)
str_sub(x,1,str_locate(x,"-")[ ,1])

this part takes as argument vector of strings a returns position of matched pattern in this case "-" in the string

str_locate(x,"-")

So this code will return matrix of start and end positions which in these case are the same numbers because "-" is only one character starting and ending at the same position

     start end
[1,]     3   3
[2,]     4   4
[3,]     3   3

When we subset this way

str_locate(x,"-")[ ,1]

we get

[1] 3 4 3

and now function str_sub gets substring of the whole string where we specify start and end position of substring. So basically it reads as for all elements of vector x make a substring starting at character 1 and end in position of first dash which is calculated as shown before.

str_sub(x,1,str_locate(x,"-")[ ,1])
Tomas H
  • 713
  • 4
  • 10
  • hi Tomas, thank you for trying to help! was wondering if you could explain a bit more of the arguments you put in? for example, what's the two "1" you have in your code? and can this scale to locate the 2nd hyphen too? – alwaysaskingquestions May 29 '16 at 21:22
  • Hi,I edited the answer to explain it. About second hyphen sure anything can be done. But I don't know if you want a string from start up to second dash or numbers from first dash until second dash – Tomas H May 30 '16 at 07:27
  • after reading your explanation, i think i understand it much better now and i think i can scale it to do 1st dash to 2nd dash myself. thank you so much! – alwaysaskingquestions May 30 '16 at 18:09