0

I have a dataframe of email addresses I need to split up by address and domain. I found tidyr and its separate command, but when I run separate, I either add a dataframe to my dataframe, called "new_var," or it prints out the correctly separated data into the console.

I need the separated data to be added as new columns to my existing dataframe.

I am using something like

separate(email_data, EMAIL_ADDRESS, into=c("address","domain"), sep="@", remove=FALSE)

I need the result to add two columns to my 'email_data' DF, one named address, and one named domain.

I looked through here and elsewhere, I tried to add use paste( instead of c( , but that didn't do it.

Any help is appreciated.

Thank you !

Adam_S
  • 687
  • 2
  • 12
  • 24
  • Are you assigning the resulting data frame back to a variable? Also, you need to supply some anonymized sample data so [your example is reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610), even if just something like `email_df <- data.frame(email = 'name@domain.com', stringsAsFactors = FALSE)` – alistaire May 14 '18 at 19:01

3 Answers3

1

The two answers supplied were helpful (and appreciated), but neither got me exactly what I needed, which is partially my fault. All I really need is the domain portion of the email address.

I was able to extract it from the email_address field and give it its own column with the following:

email_data$domain1 <- substring(email_data$EMAIL_ADDRESS, 
regexpr("@", email_data$EMAIL_ADDRESS) + 1)

substring(text, start, stop)
text = email_address field
start = +1 character after @ symbol
stop = blank b/c I want everything after the @ symbol
Adam_S
  • 687
  • 2
  • 12
  • 24
0

Here an example from a former machine learning problem:

merc1 <- merc %>% separate(category_name,into=c("cn1","cn2","cn3"),sep="/",extra="drop")is your input column character ?

Peter

Peter Hahn
  • 148
  • 8
0

You can use below code

library(stringr)    
email_data <- str_split_fixed(email_data$EMAIL_ADDRESS, "@", 2)
colnames(email_data) <- c("Address","Domain")

I have tested this and this will work.

Edit :Adding an example

Name <- c('testname', 'testname1234')
EMAIL_ADDRESS <- c('pk@sss.com', 'qwert@tyuu.com')
Init_frame <- data.frame(Name,EMAIL_ADDRESS )
Init_frame

email_data <- data.frame(EMAIL_ADDRESS)
library(stringr)
email_data <- str_split_fixed(email_data$EMAIL_ADDRESS, "@", 2)
colnames(email_data) <- c("Address","Domain")
email_data

Init_frame <- data.frame (Name,email_data)
Init_frame
Prany
  • 2,078
  • 2
  • 13
  • 31
  • Part of the way there, yes, thank you. This is one variable in a df with many. I need it to look at the email address field, and add two new columns to the DF, one is the address, one is the domain. – Adam_S May 14 '18 at 16:51
  • 1
    In this case create the email address in a temporary df, split it and add to the main dataframe. For eg like this. ---- Name <- c('testname', 'testname1234') EMAIL_ADDRESS <- c('pk@sss.com', 'qwert@tyuu.com') Init_frame <- data.frame(Name,EMAIL_ADDRESS ) Init_frame email_data <- data.frame(EMAIL_ADDRESS) library(stringr) email_data <- str_split_fixed(email_data$EMAIL_ADDRESS, "@", 2) colnames(email_data) <- c("Address","Domain") email_data Init_frame <- data.frame (Name,email_data) Init_frame – Prany May 14 '18 at 17:01