0

I'm trying to build a pivot table from this data frame below. "VisitID" is the unique ID for a user who came to visit a website, "PageName" is the page they visited, and "Order" is the sequence of the page they visited. For example, the first row of this data frame means "user 001 visited Homepage, which is the 1st page he/she visted".

    VisitID            PageName             Order
      001              Homepage               1
      001              ContactUs              2
      001              News                   3
      002              Homepage               1
      002              Careers                2
      002              News                   3

The desired output should cast "VisitID" as rows and "Order" as columns, and fill the table with the "PageName":

                 1           2          3
      001     Homepage    ContactUs    News                                             
      002     Homepage    Careers      News      

I've thought about using reshape::cast to do the task, but I believe it only works when you give it an aggregated function. I might be wrong though. Thanks in advance for anyone who can offer help.

gogolaygo
  • 199
  • 1
  • 12

1 Answers1

1

You don't need to aggregate. As long as there's only one row for each combination of columns in the casting formula, you'll get the value of value.var inserted in the output.

library(reshape2)

dcast(mydata, VisitID ~ Order, value.var="PageName")

Here's an example:

# Fake data
dat = data.frame(group1=rep(LETTERS[c(1,1:3)],each=2), group2=rep(letters[c(1,1:3)]),
                 values=1:8)
dat
  group1 group2 values
1      A      a      1
2      A      a      2
3      A      b      3
4      A      c      4
5      B      a      5
6      B      a      6
7      C      b      7
8      C      c      8

Note that rows 1 and 2 have the same values of the group columns, as do rows 5 and 6. As a result, dcast aggregates by counting the number of values in each cell.

dcast(dat, group1 ~ group2, value.var="values")
Aggregation function missing: defaulting to length
  group1 a b c
1      A 2 1 1
2      B 2 0 0
3      C 0 1 1

Now lets remove rows 1 and 5 to get rid of the duplicated group combinations. Since there's now only one value per cell, dcast returns the actual value, rather than a count of the number of values.

dcast(dat[-c(1,5),], group1 ~ group2, value.var="values")
  group1  a  b  c
1      A  2  3  4
2      B  6 NA NA
3      C NA  7  8
eipi10
  • 91,525
  • 24
  • 209
  • 285
  • I tried this code before but it threw an error: "Aggregation function missing: defaulting to length" – gogolaygo Feb 24 '16 at 23:56
  • 2
    That's a warning, not an error. It means you have more than one value in at least one of the cells. You can have a maximum of one value per cell if you want to avoid aggregation. `defaulting to length` just means that when you don't specify an aggregation function and there is more than one value in at least one cell, `dcast` will use the `length` function (which counts the number of values in each cell) as the aggregation function. – eipi10 Feb 24 '16 at 23:57