suppose, our data-frame is as follows-
(1, Mr. John, 20000) (2, Mr. Leo, 50000) (3, Miss Anne, 30000) (4, Mrs. Gerald, 35000)
I want to extract only(Mr., Miss, Mrs.) from the 'names'column and store it in a vector, how can i do this?
suppose, our data-frame is as follows-
(1, Mr. John, 20000) (2, Mr. Leo, 50000) (3, Miss Anne, 30000) (4, Mrs. Gerald, 35000)
I want to extract only(Mr., Miss, Mrs.) from the 'names'column and store it in a vector, how can i do this?
Does this help?
> df <- data.frame(id = c(1,2,3,4), name = c('Mr. John', 'Mr. Leo', 'Miss Anne', 'Mrs. Gerald'), sal = c(20000, 50000, 30000, 35000), stringsAsFactors = 0)
> df
id name sal
1 1 Mr. John 20000
2 2 Mr. Leo 50000
3 3 Miss Anne 30000
4 4 Mrs. Gerald 35000
> vec <- gsub('(^M.+)\\s([A-z].+)', '\\1', df$name)
> vec
[1] "Mr." "Mr." "Miss" "Mrs."
An alternative approach using dplyr
(and the data frame created by Karthik):
vec <- as.vector(separate(df, name, sep = " ", into = "title", extra = "drop")[2])
Where df
is your data frame, name
is whatever name you have for your names column. You use sep
to decide how to split the string up, into
lets you choose the name of your new column (if you were keeping it as a column), extra
lets you choose whether or not to display a warning (you are getting rid of surname so you would get a warning otherwise). The [2]
, shows you just want to keep the second column, which is the one you newly created. as.vector
converts it to a vector.
If you wanted to add separate the names column into two columns (i.e title and surname) and keep them inside your data frame, you could do:
df2 <- separate(df, name, sep = " ", into = c("title", "surname"))
Do this, I don't know if sapply could be an option here as well, your data frame is also incorrectly defined, I suppose df is the df you have...
df=data.frame(id=c(1,2,3,4), name=c("Mr. John","Mr. Leo","Miss Anne", "Mrs. Gerald"),
value=c(20000,50000,30000,35000))
splitted_name=strsplit(df$name," ")
a=character(0)
for (i in splitted_name) a=append(a,i[1])
print(a)