2

I have some string

string <- "Shakira - Wolf - 02.Hips don't lie.mp3"

I want only the first part, so the name of the artist. I use regex like this

stri_extract_all_regex(string, "^.*?-")

The output: "Shakira -". But I don't want " -". How to write regex which allows me to take only the substring which stands before the first " -"?

David Arenburg
  • 91,361
  • 17
  • 137
  • 196
jjankowiak
  • 3,010
  • 6
  • 28
  • 45
  • Starred answer works: http://stackoverflow.com/questions/2013124/regex-matching-up-to-the-first-occurrence-of-a-character – Vlo Dec 10 '14 at 21:08
  • Nothing works the way I want, maybe because I have several "-" in my string. – jjankowiak Dec 10 '14 at 21:24

4 Answers4

5

I think you just need (without using external packages)

sub(" -.*", "", string)
## [1] "Shakira"

Explanation

This simply matches " -" and everything after it until the end of the string and replaces it with nothing. Which basically leaves you with everything before the first " -"


If you insist on stringi package (for speed) you could use stri_extract_first with the simple regex of

stri_extract_first(string, regex = "[A-Za-z]+")
## [1] "Shakira"
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
4

The negated character-class method succeeds:

> stri_extract_all_regex(string, "^[^-]+")
[[1]]
[1] "Shakira "

Challenged by The Other David I'm attempting now to select only the spaces between alpha characters but to also use that function and therefore do it with a "positive" selection strategy:

string <- "Shakira and Friends - Wolf - 02.Hips don't lie.mp3"
stri_extract_all_regex(string, "^[[:alpha:]]+( *[[:alpha:]])*")
[[1]]
[1] "Shakira and Friends"
IRTFM
  • 258,963
  • 21
  • 364
  • 487
2

How about using strsplit?

strsplit(string, split = " -")[[1]][1]
Kara Woo
  • 3,595
  • 19
  • 31
0

Using rex may make this type of task a little simpler.

string <- "Shakira - Wolf - 02.Hips don't lie.mp3"

library(rex)
re_matches(string,
  rex(capture(zero_or_more(any, type='lazy')), spaces, "-"))$'1'

#> [1] "Shakira"
Jim
  • 4,687
  • 29
  • 30