0

I have written the following line of code for "How many NAMEs end with something other than a vowel (aeiou)?"

> v
 [1] "Javon 0502 adrknqgpnbvbbxhhflnr"    
 [2] "Ariel 2312 dmqgkgpqipthwqaqwmjz"    
 [3] "Gunnar 5355 wddpbwdkgslaodvsdxps"   
 [4] "Marcos 4354 eoajpprooplrnfjhngll"   
 [5] "Juliana 1245 zogwfbgcfckkfnrrcgob"  
 [6] "Rolando 0505 nshlkeomtgsmwfmyouma"  
 [7] "Brayan 2322 gsqwwufgacspultfyogu"   
 [8] "Nola 0011 xvhvqppxzcjiyxghhzhy"     
 [9] "Gabriela 4501 dscxbkujflwowgohuzdk" 
[10] "Nikolai 0053 wjfftdaxsvwbjptbktao"  
[11] "Haylee 2301 sruhaqrggjiesrautogk"   
[12] "Kaia 3354 vfmocbpuavocrsviwdyd"     
[13] "Adalyn 1313 pddhqfqkyfngcyuuuooe"   
[14] "Rashad 5004 dkmvrcblsizfoiwzkpfx"   
[15] "Ariana 5105 zlfyhmvjxuqkcxbksxkb"   
[16] "Alexander 3323 kcrpuwqzgdfrogbjzmvr"
[17] "Maurice 1114 gbhwwdafadlggwsezsqj"  
[18] "Austin 3324 ofxaqpvfdobdewcbwiwg"   
[19] "Lacey 1050 lwbgaudzxbiwrxtsohbt"    
[20] "Julissa 5511 jshwvizbllnoaqgerdby"  
[21] "Fernanda 3535 qgakhgddxramvconxdoj" 
[22] "Natasha 5053 ejxchrbcagkhavvzdpte"  
[23] "Adam 3252 ruilxibrihxpnsxyorkx"     
[24] "Felix 5020 laotmpdjbfwdyfcjfixh"  

grep(pattern="^[aeiou]$",x=v)

The problem is that it finds the words that don't end with a vowel rather than just the first word in the line. How can I use regex to specify that?

P.S.: Every line has the form "NAME SCORES WORD" we just care about NAMEs not ending in a vowel.

Mona Jalal
  • 34,860
  • 64
  • 239
  • 408
  • Please use something like `dput(v)` next time. Will make it much easier for people to test. See **[Reproducible Examples](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example)**. – BrodieG Mar 27 '14 at 21:13

3 Answers3

2

You're using ^ incorrectly: you need to be using it inside the []s so that it excludes vowels in the final consecutive alphabetic character starting from the beginning of each line.

grep(pattern="^[a-zA-Z]+[^aeiou] ",x=v)
Robert Krzyzanowski
  • 9,294
  • 28
  • 24
  • It doesn't give the right answer. every line has the form "NAME SCORES WORD" we just care about NAMEs not ending in a vowel. – Mona Jalal Mar 27 '14 at 20:36
  • 2
    It seems to work for me: `v <- c( "Javon 0502 adrknqgpnbvbbxhhflnr", "Ariel 2312 dmqgkgpqipthwqaqwmjz", "Nikolai 0053 wjfftdaxsvwbjptbktaz"); grep(pattern="^[a-zA-Z]+[^aeiou] ",x=v)` and I see `1 2` as expected. – Robert Krzyzanowski Mar 27 '14 at 20:41
1

Change your regex to grepl("[^aeiou]$", v)

Consider the following example:

> v <- c("Javon 0502 adrknqgpnbvbbxhhflnr",
         "Ariel 2312 dmqgkgpqipthwqaqwmjz",
         "Gunnar 5355 wddpbwdkgslaodvsdxps",
         "Marcos 4354 eoajpprooplrnfjhngll",
         "Juliana 1245 zogwfbgcfckkfnrrcgob",  
         "Rolando 0505 nshlkeomtgsmwfmyouma",  
         "Brayan 2322 gsqwwufgacspultfyogu")  
> grepl("[^aeiou]$", v)
[1]  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE

EDIT (update)

To get what you need (updated question) try:

> ind <- grepl("[^aeiou]$", v)
> sapply(strsplit(v[ind]," "), "[", 1)
[1] "Javon"   "Ariel"   "Gunnar"  "Marcos"  "Juliana"
Jilber Urbina
  • 58,147
  • 10
  • 114
  • 138
1

You might want the invert argument in grep. It will return anything that's NOT the result of grep. So if you grep for the vowels and use the invert argument, you get the names that do not end in vowels.

NOTE: I have v set up as a data frame, not a vector here.

  > sapply(v, function(x){
      gsub("\\s[0-9].*", "", grep("[aeiou]\\s", x, value = TRUE, invert = TRUE))
    })
      V1         
 [1,] "Javon"    
 [2,] "Ariel"    
 [3,] "Gunnar"   
 [4,] "Marcos"   
 [5,] "Brayan"   
 [6,] "Adalyn"   
 [7,] "Rashad"   
 [8,] "Alexander"
 [9,] "Austin"   
[10,] "Lacey"    
[11,] "Adam"     
[12,] "Felix"  
Rich Scriven
  • 97,041
  • 11
  • 181
  • 245
  • can you please explain how you just got the NAME? Like how can I get WORD using regex? `v2=sapply(v, function(x){ gsub("\\*.s[0-9]", "", grep("[aeiou]\\s", x, value = TRUE, invert = TRUE)) })` ? – Mona Jalal Mar 28 '14 at 01:59
  • Getting only the name: It's the space. Each name has a space after it. WORD ends at a letter, no space. `[aeiou]\\s` means, "Any vowel and then a space." Then, the `gsub` wrapper says "starting at the first space, substitute everything after that with `""`. – Rich Scriven Mar 28 '14 at 02:03
  • Get only WORD with `sapply(v, function(x){ gsub(".*[0-9]\\s", "", grep("[aeiou]\\s", x, value = TRUE, invert = TRUE)) })`. The `.*[0-9]\\s` in `gsub` says to sub everything up to the first character after the space after the last number... if that makes any sense. :) – Rich Scriven Mar 28 '14 at 02:07