3

I am working with a data frame that looks like the following which I need to transpose by group based on the common Id:

testDF = data.frame(c("Id", "1", "1", "2", "2"), c("Item", 'Milk','Eggs','Bacon', "Bread"))
testDF

#>Id                               Item
#>1                                Milk
#>1                                Eggs
#>2                                Bacon
#>2                                Bread

newDT <- dcast(testDF, Id ~ Item, value.var = "Item")
View(newDT)

I need the output to look like the following (excluding the header row and the Id column altogether):

Milk, Eggs (Id 1)
Bacon, Bread (Id 2)

I am receiving the error 'Error: value.var (Item) not found in input'. Can you please tell me what I am doing wrong?

Thanks, Matt

Matt Danna
  • 51
  • 6
  • For a list, use `split(testDF$Item, testDF$Id)`. This is a pretty flexible object to work with. For a less flexible object as a data.frame maybe this post: [http://stackoverflow.com/questions/16596515/aggregating-by-unique-identifier-and-concatenating-related-values-into-a-string](http://stackoverflow.com/questions/16596515/aggregating-by-unique-identifier-and-concatenating-related-values-into-a-string). – lmo Apr 04 '17 at 13:38
  • Did you mean: testDF = data.frame(Id = c("1", "1", "2", "2"), Item = ('Milk','Eggs','Bacon', "Bread")) ? – maller Apr 04 '17 at 13:42
  • When I use the code: result <- aggregate(Item ~ Id, data = testDF, paste, collapse = ",") I am receiving the error "Error in eval(expr, envir, enclos) : object 'Item' not found". Is there a reason it's not recognizing "Item"? – Matt Danna Apr 04 '17 at 13:45
  • If one of the answers below solved your problem please accept it by clicking the tick mark on its left. – JanLauGe Jul 26 '17 at 08:12

4 Answers4

3

Some good answers above, however I think this should be listed as an option as well:

df %>%
  group_by(Id) %>%
  # Create string listing all items in given Id, separated by comma
  summarise(Items = str_c(Item, collapse = ', '))

Returns:

# A tibble: 2 × 2
      Id         Items
  <fctr>         <chr>
       1    Milk, Eggs
       2  Bacon, Bread
JanLauGe
  • 2,297
  • 2
  • 16
  • 40
  • This looks good, except that you're using a non-base R function (`str_c`) without any need. You can simply use `Items = toString(Item)` – talat Apr 04 '17 at 14:10
  • The `stringr` version with `str_c` is faster: `lleters <- rep(letters, times = 1000000); baseFun <- system.time(str_c(lleters)); stringFun <- system.time(toString(lleters)); baseFun; stringFun` – JanLauGe Apr 04 '17 at 15:38
2
testDF = data.frame(Id =  c("1", "1", "2", "2"), 
                Item = c('Milk','Eggs','Bacon', "Bread"))
testDF

z <- aggregate(list(Item = testDF$Item), list(ID = testDF$Id),
               function(x) paste(x, collapse = ','))

z

   ID        Item
1  1   Milk,Eggs
2  2 Bacon,Bread
Mouad_Seridi
  • 2,666
  • 15
  • 27
  • This does transpose the data but I think I was unclear in my original post. Each row that has the same Id as the previous row should go in its own column rather than into the same column aggregated by a comma. Does this make sense and can you help? – Matt Danna Apr 04 '17 at 15:02
  • the problem with that is if you have ID's with different numbers of associated items the structure of the "data.frame" breaks down and you will need a "list' object. in this particular example it would work only because the ID's have the same number of associated Items, here is the code z <- aggregate(list(Item = testDF$Item), list(ID = testDF$Id), function(x) unique(x)) – Mouad_Seridi Apr 04 '17 at 15:28
  • Thanks. That code is giving me the following results: 1, c(3,2) on row 1 and 2, 1 on row 2, each in separate columns. – Matt Danna Apr 04 '17 at 15:38
0

With group_by and summarise functions from dplyr you have:

library("dplyr")

testDF %>% 
group_by(Id) %>% 
summarise(Items=paste0(Item,collapse=","))

testDF
#Source: local data frame [2 x 2]
#Groups: Id [2]
#
#      Id       Items
#  (fctr)       (chr)
#1      1   Milk,Eggs
#2      2 Bacon,Bread
Silence Dogood
  • 3,587
  • 1
  • 13
  • 17
0

since I didn't see a Green checkmark, figured I'd take a stab at it because I wrote a function for this exact problem.

library(dplyr)

transp <- function(input,uniq_var,compare_var,transposed_column_names = 'measurement'){
  if(class(input[,uniq_var]) == "factor"){
    input[uniq_var] = sapply(input[uniq_var],as.character)
  }

  #' input is the dataframe/data.table that you want to perform the operation on, uniq_var is the variable that you are groupying by, compare_var is the variable that is being measured in each of the groups, and transposed_colum_names is just an optional string for the user to call each of their columns (will be concatenated with an observation number, i.e. if you input 'distance', it will name the observations  'distance_1','distance_2','distance_3'...ect.)
  list_df <- input %>% group_by(input[,uniq_var]) %>% do(newcol = t(.[compare_var]))
  # it gets us the aggregates we want, BUT all of our columns are stored in a list 
  # instead of in separate columns.... so we need to create a new dataframe with the dimensions 
  # rows = the number of unique values that we are "grouping" by, noted here by uniq_var and the number of columns will be 
  # the maximum number of observations that are assigned to one of those groups.

  # so first we will create the skeleton of the matrix, and then use a user defined function 
  # to fill it with the correct values 
  new_df <- matrix(rep(NA,(max(count(input,input[,uniq_var])[,2])*dim(list_df)[1])),nrow = dim(list_df)[1])
  new_df <- data.frame(new_df)
  new_df <- cbind(list_df[,1],new_df)
  # i am writing a function inside of a function becuase for loops can take a while 
  # when doing operaitons on multiple columns of a dataframe
  func2 <- function(input,thing = new_df){

    # here, we have a slightly easier case when we have the maximum number of children 
    # assigned to a household.
    # we subtract 1 from the number of columns because the first column holds the value of the 
    # unique value we are looking at, so we don't count it 

    if(length(input[2][[1]])==dim(thing)[2]-1){
      # we set the row corresponding to the specific unique value specified in our list_df of aggregated values
      # equal to the de-aggregated values, so that you have a column for each value like in PROC Transpose. 
      thing[which(thing[,1]==input[1]),2:ncol(thing)]= input[2][[1]]

      #new_df[which(new_df[,1]==input[1]),2:ncol(new_df)]= input[2][,1][[1]][[1]]
    }else{
      thing[which(thing[,1]==input[1]),2:(1+length(input[2][[1]]))]= input[2][[1]]
    }
    # if you're wondering why I have to use so many []'s it's because our list_df has 1 column 
    # of unique identifiers and the other column is actually a column of dataframes
    # each of which only has 1 row and 1 column, and that element is a list of the transposed values 
    # that we want to add to our new dataframe 
    # so essentially the first bracket 

    return(thing[which(thing[,1]==input[1]),])
  }

  quarter_final_output <- apply(list_df,1,func2)
  semi_final_output <- data.frame(matrix(unlist(quarter_final_output),nrow = length(quarter_final_output),byrow = T))
  #return(apply(list_df,1,func2))
  # this essentially names the columns according to the column names that a user would typically specify 
  # in a proc transpose. 
  name_trans <- function(trans_var=transposed_column_names,uniq_var = uniq_var,df){
    #print(trans_var)
    colnames(df)[1] = colnames(input[uniq_var])
    colnames(df)[2:length(colnames(df))] = c(paste0(trans_var,seq(1,(length(colnames(df))-1),1)))
    return(df)

  }
  final_output <- name_trans(transposed_column_names,uniq_var,semi_final_output)
  return(final_output)

}

In your case, you'd apply it like this:

transp(testDF,uniq_var = 'Id',compare_var = "Item")

If you want to download it from my github https://github.com/seanpili/R_PROC_TRANSPOSE