4

Noob here to R. Trying to figure something out. I need to build a function that adds a new column to the beginning of a dataset. This new column is a concatenation of the values in other columns that the user specifies.

Imagine this is the data set named myDataSet:

col_1    col_2    col_3    col_4
bat      red      1        a
cow      orange   2        b
dog      green    3        c

The user could use the function like so:

addPrimaryKey(myDataSet, cols=c(1,3,4))

to get the result of a new data set with columns 1, 3 and 4 concatenated into a column called ID and added to the beginning, like so:

ID        col_1    col_2    col_3    col_4
bat1a     bat      red      1        a
cow2b     cow      orange   2        b
dog4c     dog      green    3        c

This is the script I have been working on but I have been staring at it so long, I think I have made a few mistakes. I can't figure out how to get the column numbers from the arguments into the paste function properly.

addPrimaryKey <- function(df, cols=NULL){

  newVector = rep(NA, length(cols)) ##initialize vector to length of columns

  colsN <- as.numeric(cols)

  df <- cbind(ID=paste(
    for(i in 1:length(colsN)){
      holder <- df[colsN[i]]
      holder
    }
  , sep=""), df) ##concatenate the selected columns and add as ID column to df
df
}

Any help would be greatly appreciated. Thanks so much

A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485

3 Answers3

12

paste0 works fine, with some help from do.call:

do.call(paste0, mydf[c(1, 3, 4)])
# [1] "bat1a" "cow2b" "dog3c"

Your function, thus, can be something like:

addPrimaryKey <- function(inDF, cols) {
  cbind(ID = do.call(paste0, inDF[cols]),
        inDF)
}

You may also want to look at interaction:

interaction(mydf[c(1, 3, 4)], drop=TRUE)
# [1] bat.1.a cow.2.b dog.3.c
# Levels: bat.1.a cow.2.b dog.3.c
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
  • This works perfectly, thanks so much. I am looking at do.call on the internet but still can't understand how it works here with paste. Could you possibly explain why this worked? – Crayon Constantinople Feb 10 '14 at 16:36
  • 1
    @CrayonConstantinople, `mydf[c(1, 3, 4)]` is effectively a list with three vectors since data frames are basically lists. `do.call(paste0...)` is equivalent to `paste0(mydf[, 1], mydf[, 3], mydf[, 4])`, where each value in the list becomes an argument to `paste0`. – BrodieG Feb 10 '14 at 17:33
1

This should do the trick

addPrimaryKey <-function(df, cols){

   q<-apply(df[,cols], 1, function(x) paste(x, collapse=""))

   df<-cbind(q, df)

   return(df)

}

Just add in some conditional logic for your nulls

1

Two other options for combining columns are dplyr::mutate() and tidyr::unite():

library(dplyr)

df %>%
  mutate(new_col = paste0(col1, col3, col4)) %>% 
  select(new_col, everything()) # to order the column names with the new column first


library(tidyr)

df %>% 
  unite(new_col, c(col1, col3, col4), sep = '', remove = FALSE)

The default argument in tidy::unite() is remove = TRUE, which drops the original columns from the data frame leaving only the new column.

sbha
  • 9,802
  • 2
  • 74
  • 62