3

In a related post someone asked how to grab from beginning of string to first occurrence of a character. I'd like to extend my own knowledge of regex by asking how to grab from a certain character of the string to the end.

How could I use regex (not strsplit) with gsub to grab from the beginning of the first space to the end of the string?

dob <- c("9/9/43 12:00 AM/PM", "9/17/88 12:00 AM/PM", "11/21/48 12:00 AM/PM")

Here I tried: gsub(".*? ", "", dob) but it grabs from the last space not the first so I tried gsub(".{1}? ", "", dob) but it is overly greedy because of the period.

Final solution would be the same as:

sapply(lapply(strsplit(dob, "\\s+"), "[", 2:3), paste, collapse=" ")
##[1] "12:00 AM/PM" "12:00 AM/PM" "12:00 AM/PM"

NOTE: R regex is not identical to regex in general

Community
  • 1
  • 1
Tyler Rinker
  • 108,132
  • 65
  • 322
  • 519

3 Answers3

4

Try :

gsub("^(.*?) .*$", "\\1", dob)
# [1] "9/9/43"   "9/17/88"  "11/21/48"

If you want from the first space to the end of the string, try :

gsub("^.*? (.*)$", "\\1", dob)
# [1] "12:00 AM/PM" "12:00 AM/PM" "12:00 AM/PM"
juba
  • 47,631
  • 14
  • 113
  • 118
  • 1
    Both Daniel and Juba's answer worked for me but this one is more generalizable in that `gsub("^.* .*? (.*)$", "\\1", dob)` can grab from the second space on. – Tyler Rinker Apr 09 '13 at 14:31
2

You forgot the indicator for the beginning of the string:

gsub("^.*? ", "", dob)

Note the caret at the beginning. Your first solution wasn't too greedy, but found two strings and replaced them.

Daniel R.
  • 66
  • 5
1

Try below

dob [1] "9/9/43 12:00 AM/PM" "9/17/88 12:00 AM/PM"
[3] "11/21/48 12:00 AM/PM"
gsub("(.?) (.$)", "\2", dob)
[1] "12:00 AM/PM" "12:00 AM/PM" "12:00 AM/PM"

Tyler Rinker
  • 108,132
  • 65
  • 322
  • 519
CHP
  • 16,981
  • 4
  • 38
  • 57