-5

I have a file in which clickstreams are stored in csv format. The data looks like this:

Row 1. User1 - Click1

Row 2. User1 - Click2

Row 3. User1 - Click3

Row 4. User2 - Click1

Row 5. User3 - Click1

Row 6. User3 - Click2

and so on

Is there a function in r to give the data the following form

Row 1. User1- Click1 - Click2 - Click3

Row 2. User2 - Click1

Row 3. User3 - Click1 - Click2

Thanks

Roland
  • 127,288
  • 10
  • 191
  • 288
  • 3
    How your data looks like is not very useful. We need to know the exact data structure. Please read [this FAQ](http://stackoverflow.com/a/5963610/1412059). You should also show some of your own efforts of solving this. – Roland Jul 29 '15 at 11:38
  • Alright Roland. Thanks. Will take of that – Vaibhav Srivastava Jul 29 '15 at 11:48

3 Answers3

1
library(reshape2)
df <- data.frame(user = rep(LETTERS[1:3], each = 3), click = rep(1:3, times = 3))
dfmelt <- melt(df, id = "user")
dfcast <- dcast(dfmelt, user ~ variable + value)

Here's the toy data:

> df
  user click
1    A     1
2    A     2
3    A     3
4    B     1
5    B     2
6    B     3
7    C     1
8    C     2
9    C     3

Here's the result:

> dfcast
  user click_1 click_2 click_3
1    A       1       2       3
2    B       1       2       3
3    C       1       2       3

You can also do this in one line, but you won't get the nice column names:

> dcast(df, user ~ click)

  user 1 2 3
1    A 1 2 3
2    B 1 2 3
3    C 1 2 3
ulfelder
  • 5,305
  • 1
  • 22
  • 40
  • Thanks ulfelder. The issue in this case is that I cant set the value for the number of clicks as 3 as the number of clicks vary by each user – Vaibhav Srivastava Jul 29 '15 at 12:42
  • The number of clicks doesn't have to be constant across users for this to work. If the numbers are uneven, `dcast()` will put NAs in the extras. So if user A has n clicks and user B has n - 2, you'll get NAs in the last two columns for user B. In other words, it will do the same thing that `splitstackshape` does under those conditions. – ulfelder Jul 29 '15 at 12:46
1

This can be one option

library(splitstackshape)
cSplit(setDT(df)[, toString(V4), by='V3'], 'V1', ',')

#      V3    V1_1    V1_2    V1_3
#1: User1 -Click1 -Click2 -Click3
#2: User2 -Click1      NA      NA
#3: User3 -Click1 -Click2      NA

data

df = structure(list(V1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "Row", class = "factor"), 
    V2 = c(1, 2, 3, 4, 5, 6), V3 = structure(c(1L, 1L, 1L, 2L, 
    3L, 3L), .Label = c("User1", "User2", "User3"), class = "factor"), 
    V4 = structure(c(1L, 2L, 3L, 1L, 1L, 2L), .Label = c("-Click1", 
    "-Click2", "-Click3"), class = "factor")), .Names = c("V1", 
"V2", "V3", "V4"), class = "data.frame", row.names = c(NA, -6L
))
Veerendra Gadekar
  • 4,452
  • 19
  • 24
0

Having this data frame, using the reshape function:

   user   click
1 User1 -Click1
2 User1 -Click2
3 User1 -Click3
4 User2 -Click1
5 User3 -Click1
6 User3 -Click2

df$n <- df$click
reshape(df, idvar="user", timevar="click" ,direction="wide")

Output:

   user n.-Click1 n.-Click2 n.-Click3
1 User1   -Click1   -Click2   -Click3
4 User2   -Click1      <NA>      <NA>
5 User3   -Click1   -Click2      <NA>
mpalanco
  • 12,960
  • 2
  • 59
  • 67