Use strsplit to get last character in r

Question

I have a file of baby names that I am reading in and then trying to get the last character in the baby name. For example, the file looks like..

Name      Sex 
Anna      F
Michael   M
David     M
Sarah     F

I read this in using

sourcenames = read.csv("babynames.txt", header=F, sep=",")

I ultimately want to end up with my result looking like..

Name   Last Initial   Sex
Michael  l             M
Sarah    h             F

I've managed to split the name into separate characters..

sourceout = strsplit(as.character(sourcenames$Name),'')

But now where I'm stuck is how to get the last letter, so in the case of Michael, how to get 'l'. I thought tail() might work but its returning the last few records, not the last character in each Name element.

Any help or advice is greatly appreciated.

Thanks :)

There is an answer for that here: http://stackoverflow.com/questions/77434/how-to-access-the-last-value-in-a-vector — Karolis Koncevičius, Oct 16 '14 at 20:56
Have you seen [this](http://stackoverflow.com/questions/7963898/extracting-the-last-n-characters-from-a-string-in-r) suggestion? — kferris10, Oct 16 '14 at 21:03
http://stackoverflow.com/questions/7963898/extracting-the-last-n-characters-from-a-string-in-r — GSee, Oct 17 '14 at 11:55

score 14 · Accepted Answer · answered Oct 16 '14 at 21:23

14

For your strsplit method to work, you can use tail with sapply

df$LastInit <- sapply(strsplit(as.character(df$Name), ""), tail, 1)
df
#      Name Sex LastInit
# 1    Anna   F        a
# 2 Michael   M        l
# 3   David   M        d
# 4   Sarah   F        h

Alternatively, you can use substring

with(df, substring(Name, nchar(Name)))
# [1] "a" "l" "d" "h"

answered Oct 16 '14 at 21:23

Rich Scriven

97,041
11
181
245

Thanks, it works a treat. Can I ask how the tail part is working in the sapply, like which parameters of sapply() is it I'm passing tail and the value 1 into. I'm really new to R so apologies if this is a silly question. – CodeLearner Oct 16 '14 at 21:36
1

Sure. You're applying `tail` iteratively through the `strsplit` list, taking the `1`st element of the tail-end of each vector. The `1` is telling `tail` how many elements to take. The default is six, which is what you were probably getting – Rich Scriven Oct 16 '14 at 21:38

score 7 · Answer 2 · edited Jan 04 '15 at 11:02

Try this function from stringi package:

require(stringi)
x <- c("Ala", "Sarah","Meg")
stri_sub(x, from = -1, to = -1)

This function extracts substrings between from and to index. If indexes are negative, then it counts characters from the end of a string. So if from=-1 and to=-1 it means that we want substring from last to last character :)

Why use stringi? Just look at this benchmarks :)

require(microbenchmark)
x <- sample(x,1000,T)
microbenchmark(stri_sub(x,-1), str_extract(x, "[a-z]{1}$"), gsub(".*(.)$", "\\1", x), 
                    sapply(strsplit(as.character(x), ""), tail, 1), substring(x, nchar(x)))

Unit: microseconds
                                           expr       min         lq     median         uq       max neval
                                stri_sub(x, -1)    56.378    63.4295    80.6325    85.4170   139.158   100
                    str_extract(x, "[a-z]{1}$")   718.579   764.4660   821.6320   863.5485  1128.715   100
                     gsub(".*(.)$", "\\\\1", x)   478.676   493.4250   509.9275   533.8135   673.233   100
 sapply(strsplit(as.character(x), ""), tail, 1) 12165.470 13188.6430 14215.1970 14771.4800 21723.832   100
                         substring(x, nchar(x))   133.857   135.9355   141.2770   147.1830   283.153   100

score 2 · Answer 3 · answered Oct 16 '14 at 21:37

Here is another option using data.table (for relatively clean syntax) and stringr (easier grammar).

library(data.table); library(stringr)

df = read.table(text="Name      Sex 
Anna      F
Michael   M
David     M
Sarah     F", header=T)
setDT(df) # convert to data.table

df[, "Last Initial" := str_extract(Name, "[a-z]{1}$") ][]

          Name Sex Last Initial
    1:    Anna   F            a
    2: Michael   M            l
    3:   David   M            d
    4:   Sarah   F            h

score 2 · Answer 4 · answered Jan 04 '15 at 10:57

2

One liner:

x <- c("abc","123","Male")
regmatches(x,regexpr(".$", x))
## [1] "c" "3" "e"

answered Jan 04 '15 at 10:57

eipi10 · Answer 5 · 2014-10-16T20:58:41.850

1

You can do it with a Regular Expression and gsub:

sourcenames$last.letter = gsub(".*(.)$", "\\1", sourcenames$Name)

sourcenames

     Name Sex last.letter
1    Anna   F           a
2 Michael   M           l
3   David   M           d
4   Sarah   F           h

edited Oct 16 '14 at 20:58

answered Oct 16 '14 at 20:53

eipi10

91,525
24
209
285

Ok use this. `regex` blows `substr` method out of the water. 2x the speed :-( `unlist(Map(function(x) substring(x, nchar(x)), sourcenames$Name))` – Vlo Oct 16 '14 at 21:03

score 1 · Answer 6 · answered Jan 06 '21 at 12:14

1

you can try this one... str_sub() function in stringr package would help you.

library(dplyr)
library(stringr)
library(babynames)
babynames %>%
  select(name,sex) %>%
  mutate(last_letter = str_sub(name,-1,-1)) %>%
  head()

answered Jan 06 '21 at 12:14

pomatomus

121
1
10

Ömer An · Answer 7 · 2017-03-08T02:09:55.167

0

dplyr approach:

sourcenames %>% rowwise() %>% mutate("Last Initial" = strsplit(as.character(Name),'') %>% unlist() %>% .[length(.)])

edited Mar 08 '17 at 02:09

answered Mar 08 '17 at 02:02

Ömer An

600
5
16

Use strsplit to get last character in r

7 Answers7