Find out single word names

Question

I have a Name column and names are like this:

Preety .. 
Sudalai Rajkumar S. 
Parvathy M. S. 
Navaraj Ranjan Arthur

I want to get which of these are single-word names, like in this case Preety.

I have tried eliminating the "." and " " and counting the length and using the difference of this length and the original string length.

But it's not giving me the desired output. Please help.

NBData3$namewodot <- gsub(" .","",NBData3$Client.Name)
NBData3$namewoblank <- gsub(" ","",NBData3$namewodot)
wordlength <- NBData3$namelengthchar-nchar(as.character(NBData3$namewoblank))

Possible duplicate: https://stackoverflow.com/questions/8920145/count-the-number-of-all-words-in-a-string — cardinal40, Aug 26 '19 at 16:50

score 1 · Answer 1 · answered Aug 26 '19 at 17:03

1

This seems to work for your example

names = c("Preety ..", 
"Sudalai Rajkumar S." ,
"Parvathy M. S.", 
"Navaraj Ranjan Arthur")

names[sapply(strsplit(gsub(".","",names,fixed=T)," ",fixed=T),function(x) length(x) == 1)]

[1] "Preety .."

answered Aug 26 '19 at 17:03

Fino

1,774
11
21

3

there is a function called `lengths` instead of `sapply` use `lengths(strsplit(gsub("\\W+$","",names)," "))` – Onyambu Aug 26 '19 at 17:40

score 1 · Answer 2 · edited Aug 26 '19 at 17:43

This may be a bit round about, but here would be a text mining approach. There are definitely more streamlined ways, but I thought there might be concepts in here that are also useful.

# define the data frame

df <- data.frame(Name = c("Preety ..",
                          "Sudalai Rajkumar S.",
                          "Parvathy M. S.",
                          "Navaraj Ranjan Arthur"),
                 stringsAsFactors = FALSE)

library(tidyverse)
library(tidytext)



# break each name out by words. remove all the periods

df_token <- df %>%
  rowid_to_column(var = "name_id") %>%
  mutate(Name = str_remove_all(Name, pattern = "\\.")) %>%
  unnest_tokens(name_split, Name, to_lower = FALSE)

# find the lines with only one word

df_token %>%
  group_by(name_id) %>%
  summarize(count = n()) %>%
  filter(count == 1) %>%
  left_join(df_token) %>%
  pull(name_split)

[1] "Preety"

I see the edit, but I do not like referencing `tidyverse`. It loads a bunch that I don't necessarily need, and I like to know where functions are coming from. Preference, I guess. — , Aug 26 '19 at 17:46

score 1 · Accepted Answer · answered Aug 26 '19 at 17:07

1

You could use str_count from stringr inside an ifelse() statement to check one worded names; first removing dots from names with gsub.

library(stringr) 

NBData3$namewodot <- gsub("\\.", "", NBData3$Client.Name)
NBData3$oneword <- ifelse(str_count(NBData3$namewodot , '\\w+') == 1, TRUE, FALSE)


#               Client.Name         namewodot oneword 
# 1             Preety ..               Preety   TRUE
# 2   Sudalai Rajkumar S.    Sudalai Rajkumar S FALSE
# 3        Parvathy M. S.          Parvathy M S FALSE
# 4 Navaraj Ranjan Arthur Navaraj Ranjan Arthur FALSE

answered Aug 26 '19 at 17:07

David Jorquera

2,046
12
35

1

You do not need `ifelse` comparison `==` will give you a true or false. Thus just `str_count(names,"\\w+")` is enough – Onyambu Aug 26 '19 at 17:44
Thanks @David, this is exactly what I needed. – Ambarish Chatterjee Sep 12 '19 at 16:42

score 1 · Answer 4 · answered Aug 26 '19 at 17:55

1

in base R you could use grep:

grep("^\\S+$", gsub("\\W+$", "", names), value=T)
[1] "Preety"

If you need the names as originally given, then you will just use [:

names[grep("^\\S+$", gsub("\\W+$", "", names))]

[1] "Preety .."

answered Aug 26 '19 at 17:55

Onyambu

67,392
3
24
53

Find out single word names

4 Answers4