1

I have a dataset called 'names' as shown below. The 'expected.entry.in.this.col' column is currently empty, but below I have shown how it should look. How can I write the logic?

Basically I think I'll need to run a loop through every row and for each row, use an 'if' condition to check the format and then enter the data into 'expected.entry.in.this.col' appropriately. How would I go about doing this? (a bit unfamiliar with R syntax for these kind of tasks).

names

enter image description here

EDIT: row 3 is a mistake and should read williams.harry

halfer
  • 19,824
  • 17
  • 99
  • 186
Programmer
  • 1,266
  • 5
  • 23
  • 44
  • 2
    Please read [How to make a great reproducible example in R?](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – M-- Jul 06 '17 at 13:24
  • @Masoud Thanks, I'll have a read. Haha, I think I'm misunderstanding R - there doesn't seem to be a lot of logical/sequential programming, seems like one line of code is enough to alter all fields in a dataset. – Programmer Jul 06 '17 at 13:27
  • Are all your "format" cases are here ? – MBnnn Jul 06 '17 at 13:28
  • Mostly yes. R has a vectorized environment that promotes one liners. The link I provided won't answer your question but helps you to improve your question. – M-- Jul 06 '17 at 13:28
  • @MBnnn considering 'expected.entry.in.this.col' will be used to create emails, I guess all should be lowercase, not that it matters. – Programmer Jul 06 '17 at 13:29
  • This question dies to use `regex` I wish you would give people with regex knowledge take a look at this before accepting an "answer". – M-- Jul 06 '17 at 14:31
  • @Masoud I'm more than happy to look at more answers! Someone below told me to always accept an answer when it solves the problem so I listened to them. – Programmer Jul 06 '17 at 14:49
  • @novice has it really solved your problem? You need to write at least 6 different `ifelse` blocks. But if indeed it solved your problem, I should say, yes, accept an answer that did so. – M-- Jul 06 '17 at 15:12
  • @Masoud It has, the program is now complete and it works! Ended up writing 10 ifelse blocks though haha – Programmer Jul 06 '17 at 15:55

3 Answers3

1

try something like this:

df <- data.frame(first = c("Kevin", "Megan"), last = c("Spacey", "Fox"),
                 format = c("f.last", "F.L."))

df$new <- NA
df$new <- ifelse(df$format == "f.last",
                 tolower(paste0(substr(df$first,1,1),".",df$last)),
                 df$new)
df$new <- ifelse(df$format == "F.L.",
                 paste0(substr(df$first,1,1),".", substr(df$last,1,1)),
                 df$new)

df

  first   last format      new
1 Kevin Spacey f.last k.spacey
2 Megan    Fox   F.L.      M.F
minem
  • 3,640
  • 2
  • 15
  • 29
0

I've made like this, I hope you'll get the logic ! Tell me if it's what you want

first = c('John','Michael',"Harry","Stephen","Simon",'Rachael',"Paul")
last = c("smith","Johnson","Williams","Jones","Adams","Moore","Taylor")
format = c("first.last","firstlast","last.first","f.last","flast","f_last","f_last")

names = data.frame(cbind(first,last,format))

names$first = as.character(names$first)
names$last = as.character(names$last)
names$format = as.character(names$format)

library(stringr)

for (i in 1:dim(names)[1]){
  if (names[i,"format"] == "first.last"){
    names[i,"new_var"] = paste(tolower(names[i,"first"]),tolower(names[i,"last"]), sep = '.')
  }else if (names[i,"format"] == "firstlast"){
    names[i,"new_var"]= paste(tolower(names[i,"first"]),tolower(names[i,"last"]), sep = '')
  }else if (names[i,"format"] == "last.first"){
    names[i,"new_var"] = paste(tolower(names[i,"last"]),tolower(names[i,"first"]), sep = '.')
  }else if (names[i,"format"] == "f.last"){
    names[i,"new_var"] = paste(tolower(str_sub(names[i,"first"],1,1)),tolower(names[i,"last"]),sep=".")
  }else if (names[i,"format"] == "flast"){
    names[i,"new_var"] = paste(tolower(str_sub(names[i,"first"],1,1)),tolower(names[i,"last"]),sep="")
  }else{
    names[i,"new_var"] = paste(tolower(str_sub(names[i,"first"],1,1)),tolower(names[i,"last"]),sep="_")
  }
}

names

    first     last     format        new_var
1    John    smith first.last     john.smith
2 Michael  Johnson  firstlast michaeljohnson
3   Harry Williams last.first williams.harry
4 Stephen    Jones     f.last        s.jones
5   Simon    Adams      flast         sadams
6 Rachael    Moore     f_last        r_moore
7    Paul   Taylor     f_last       p_taylor
> 
MBnnn
  • 308
  • 2
  • 13
  • That's awesome man, thanks so much for your effort, only thing is, could you please show me how this would be repeated for like say hundreds of entries, as the 7 I showed above are just samples. Thanks again, legend! – Programmer Jul 06 '17 at 13:43
  • 1
    Yes I know that R don't need to work with loops, and your solution is better ! I've always the bad reflex to make loop and forget that with a simple "$" it's 10x better. – MBnnn Jul 06 '17 at 13:45
  • 1
    @novice Check the solution of Mārtiņš Miglinieks. But if you want to aplly my solution, it will work with the number of rows you have (the loop goes for every loop : dim(data)[1] is actually the number of rows your data have – MBnnn Jul 06 '17 at 13:47
  • Thanks to both of you, both are great solutions! – Programmer Jul 06 '17 at 13:52
  • Who wants the tick? You both are great! – Programmer Jul 06 '17 at 13:53
  • 1
    @novice You decide, that is the point of this site.. But definitely give it to someone, because, I hate when someone have not ticked/ accepted the answer that solved their problem.. – minem Jul 06 '17 at 14:02
0

This is a solution with a "lookup table" and withoutif's

mydf <- data.frame(
  first= c("John", "Michael", "Harry", "Stephen", "Simon", "Rachael", "Paul"),
  last = c("Smith", "Johnson", "Williams", "Jones", "Adams", "Moore", "Taylor"),
  format = c("first.last", "firstlast", "last.first", "f.last", "flast", "f_last", "f_last"),
  expected = c("", "", "", "", "", "", ""),
  stringsAsFactors = FALSE
  )
library(dplyr)

firstList <- c("first.last", "firstlast", "f.last", "flast", "f_last")

#if in the format is in firstList, then do this
mydf[mydf$format %in% firstList, ]$expected <- paste0(
  mydf[mydf$format %in% firstList, ]$first, ".", 
  mydf[mydf$format %in% firstList, ]$last)

mydf[ !(mydf$format %in% firstList), ]$expected <- paste0(
  mydf[ !(mydf$format %in% firstList), ]$last, ".", 
  mydf[ !(mydf$format %in% firstList), ]$first)
)
KoenV
  • 4,113
  • 2
  • 23
  • 38