0

I've a data in the following format

     Id        Duration  Name
    (Numeric)  (Factor)  (Factor)

     1          2         x
     1          3         y
     1          1         z
     2          1         x
     2          2         x

I want to iterate over the 'id' field and for each unique id, I need to create an array from 'Name' field of the form (x,y,z) {the order is important}.

The expected output would look something like a map

     1 : (x,y,z)
     2 : (x,x)

I'm using a nested for loop to iterate over the length of the unique(Id) but i feel i'm defeating the purpose of using R.

I feel a little rusty with my understanding of the apply family of functions and although i looked at this and specifically this but the challenge in using lapply also is the difference in data types of the columns.

Do let me know if someone can suggest a better alternative than using for loop.

Thanks in advance.

Community
  • 1
  • 1
hbabbar
  • 947
  • 4
  • 15
  • 33

2 Answers2

0

We can use dplyr as the OP's initial dataset seems to be tbl class.

library(dplyr)
df1 %>%
     group_by(Id) %>%
     summarise(val = toString(Name))
#     Id     val
#   (int)   (chr)
#1     1 x, y, z
#2     2    x, x

data

df1 <- structure(list(Id = c(1L, 1L, 1L, 2L, 2L), Duration = 
 structure(c(2L, 
3L, 1L, 1L, 2L), .Label = c("1", "2", "3"), class = "factor"), 
    Name = structure(c(1L, 2L, 3L, 1L, 1L), .Label = c("x", "y", 
    "z"), class = "factor")), .Names = c("Id", "Duration", "Name"
), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame" ))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • I've run the code in the following format : df1 %>% group_by(df1$id) %>% summarise(val = toString(df1$Name)) I'm getting the following exception on running this **Error: invalid subscript type 'double'** – hbabbar Jan 04 '16 at 12:49
  • @hbabbar I am not getting any error. I updated with the dataset used. Can you update your post with the dput output of the example showed. I am using `dplyr_0.4.3` on `R 3.2.3` – akrun Jan 04 '16 at 12:53
  • 1
    In case your data is not pre-ordered, you will want to insert an arrange(Name) before group_by() in the above code. – Gopala Jan 04 '16 at 13:21
  • @akrun : Maybe the sample data that i was showing doesnt capture the missing values in my original dataset and the format that i have. And my data is pre ordered still i get the same error. – hbabbar Jan 04 '16 at 18:30
  • @hbabbar Can you update the post with the `dput` of a small example that gives the error. It would be easier to fix in that way. – akrun Jan 05 '16 at 02:45
0

I suggest using data.table package:

library(data.table)

dt <- as.data.table(df)
out <- dt[, list(res = paste(Name, collapse = ',')), by = Id]
danas.zuokas
  • 4,551
  • 4
  • 29
  • 39