reshape a data frame with repeating rows

Question

I have a data frame that looks like this:

master_bill_no  category
SBA5100008  CONDOMS
SBA5100008  HAND CREAM
SBA5100009  PREGNANCY TESTS
SBA5100010  MULTI VITAMINS & MIN
SBA5100010  CALCIUM PREPARATIONS
SBA5100010  VITAMINS
SBA5100010  BETABLOCKERS

a reproducible example is given below:

structure(list(master_bill_no = c("SBA5100008", "SBA5100008", 
"SBA5100009", "SBA5100010", "SBA5100010", "SBA5100010", "SBA5100010"
), category = c("CONDOMS", "HAND CREAM", "PREGNANCY TESTS", "MULTI VITAMINS & MIN", 
"CALCIUM PREPARATIONS", "VITAMINS", "BETABLOCKERS")), .Names = c("master_bill_no", 
"category"), class = "data.frame", row.names = c(NA, -7L))

For each unique master bill no, i am trying to reshape the column category to a wide one.

For example, the desired output would be:

master_bill_no  category
SBA5100008  CONDOMS,HAND CREAM
SBA5100009  PREGNANCY TESTS
SBA5100010  MULTI VITAMINS & MIN,CALCIUM PREPARATIONS,CALCIUM PREPARATIONS,BETABLOCKERS

I used the base reshape formula, and it just deletes the category column.

reshape(df, idvar = "master_bill_no", timevar = "category", direction = "wide")

I tried aggregate function:

aggregate(df, master_bill_no, FUN = paste(category, sep = ","))

This returns a error message "object category not found"

I am sure the reason for this is reshape is looking for values to fill which is missing. Can someone help please?

score 0 · Accepted Answer · answered Apr 10 '16 at 08:39

0

imho - best to use base functions like aggregate: the proper syntax should be:

aggregate(df$category, by=list(df$master_bill_no), FUN = paste)
         ( the field ,    list of 'group by'     , the fun to operate on field )

>df
  master_bill_no             category
1     SBA5100008              CONDOMS
2     SBA5100008           HAND CREAM
3     SBA5100009      PREGNANCY TESTS
4     SBA5100010 MULTI VITAMINS & MIN
5     SBA5100010 CALCIUM PREPARATIONS
6     SBA5100010             VITAMINS
7     SBA5100010         BETABLOCKERS


> aggregate(df$category, by=list(df$master_bill_no), FUN = paste)
     Group.1                                                                  x
1 SBA5100008                                                CONDOMS, HAND CREAM
2 SBA5100009                                                    PREGNANCY TESTS
3 SBA5100010 MULTI VITAMINS & MIN, CALCIUM PREPARATIONS, VITAMINS, BETABLOCKERS

answered Apr 10 '16 at 08:39

Zahiro Mor

1,708
1
16
30

Many thanks, this works. Thank you for the detailing too. I have been using aggregate function wrongly I guess. But wherever I am calculating the, say, sum or mean, the function returns the desired output. So I stuck to my version, which didn't work now. Thank you again. – Apricot Apr 10 '16 at 08:43
1

yes, aggregate function has a somewhat awkward syntax but so powerful and versatile - I tend to forgive its awkwardness. by the way - a very nice function for similar operations is 'summaryBy' from the package doBy - I recommend you give it a try -maybe it's api will be easier for you.... – Zahiro Mor Apr 10 '16 at 08:46
Thank you for the suggestion...will surely give it a try. – Apricot Apr 10 '16 at 08:49

reshape a data frame with repeating rows

1 Answers1