1

I'm new to R Programming.

I have two lists one list contains usernames. Another list contains pages visited by each user

user: AAA BBB CCC DDD

records:

page 1  AAA  
page 2  BBB  
page 3  AAA  
page 4  BBB   
page 1  BBB    
page 4  AAA  

I need to gather all pages visited by each user

Output required:

Pages visited by AAA page1,page 3, page 4  
Pages visited by BBB page 2, page4, page 1   

I am trying to store the pages visited by each user in a matrix
For instance, columns in row 1 of the matrix will contain pages viewed by user 1 and so on
Please look at my code below:

k <- 0
    out <- matrix(NA, nrow=100, ncol=50) #my final output matrix
    for (i in users) 
    {
    k <- k+1
    p <- 0
    for (j in records) 
    {
     x<-(strsplit(j, "\t"))
    if(x[[1]][2]== i) #gather all pages visited by a same user
    {
    p <- p+1    
    out[k,p]=c(x[[1]][1])
    }
    }
    x <- 0
    #here i need to remove unused columns in row k
    }
out <- out[1:(k),]  #remove unused rows in a matrix
print (out)

Output I get

page1 page3 page4 NA NA NA .... NA   
page2 page4 page1 NA NA NA .... NA

Final Matrix required:

page1 page3 page4     
page2 page4 page1  
Sotos
  • 51,121
  • 6
  • 32
  • 66
AJOY
  • 35
  • 1
  • 7
  • 1
    Can you give a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) of your two lists? If you bind the two lists into a data frame then a simple aggregate (`aggregate(pages ~ user, df, toString)`) should do the job – Sotos Jun 16 '17 at 15:38
  • do all users visit the same number of pages? If not, a matrix won't work, as you would need different number of columns for each user. – psychOle Jun 16 '17 at 15:41
  • @herbaman the number of pages viewed by each user is different. now i understand my mistake that a matrix cannot be used as it should be some fixed rows x cols. Can you suggest any alternate way to achieve my expected solution? – AJOY Jun 16 '17 at 15:48
  • I have been working on this solution, but I agree with @herbaman, a matrix has a fixed number of columns and it will impact the final output. You are having NA printed because your matrix' cells are initialized with no values. Possible ways of providing the final solution would be: a) initialize the matrix with empty string or set the na.print flag at the out function to visualize NA values as eg. empty string, or c) a combination of a) or b) and trimming the matrix rows to the max number of pages viewed by a single user – Juan Pablo Paz Grau Jun 16 '17 at 15:52
  • You can either work with list of lists, `aggregate()` as mentioned by @Sotos, deplyr's `group_by()`, or `data.table()`. I second @Sotos, that you should provide a reproducible example of your data. – psychOle Jun 16 '17 at 16:00
  • @herbaman, I have been using these sets to work on the provided code: users <- c("AAA","BBB","CCC","DDD") records <- c('page 1\tAAA' ,'page 2\tBBB' ,'page 3\tAAA' ,'page 4\tBBB' ,'page 1\tBBB' ,'page 4\tAAA' ) – Juan Pablo Paz Grau Jun 16 '17 at 16:02
  • @JuanPabloPazGrau i will try trimming the columns based on max pages viewed by every user – AJOY Jun 16 '17 at 16:02

2 Answers2

0

This would do the trick:

 k <- 0
 out <- matrix(NA, nrow=100, ncol=50) #my final output matrix

 #Initialize max count of rows
 maxr<-0

 for (i in users) 
 {
   k <- k+1
   p <- 0


   for (j in records) 
   {
     x<-(strsplit(j, "\t"))
     if(x[[1]][2] == i) #gather all pages visited by a same user
     {
       p <- p+1    
       out[k,p] =c(x[[1]][1])

       #If we have a greater p, p will be the new maxr
       if(p > maxr)
       {
         maxr <- p
       }
     }
   }
   x <- 0
   #here i need to remove unused columns in row k
 }

 #Trim matrix by rows and cols
 out <- out[1:(k),1:(maxr)]  #remove unused rows in a matrix

 #Replace NA with empty string
 print (out, na.print = '')

Hope this solution helps.

Regards,

0

Taking records as input, then

df <- as.data.frame(do.call(rbind, strsplit(gsub('\t', ' ', records), ' ')), 
                                                     stringsAsFactors = FALSE)

aggregate(V2 ~ V3, df, toString)
#   V3      V2
#1 AAA 1, 3, 4
#2 BBB 2, 4, 1

If you want a matrix, then,

m1 <- aggregate(V2 ~ V3, df, matrix)

m1[,-1]
#     [,1] [,2] [,3]
#[1,] "1"  "3"  "4" 
#[2,] "2"  "4"  "1" 

Or If you really want it with 'page' in front of the numbers,

matrix(paste0('page', m1[,-1]), nrow = nrow(m1))
#     [,1]    [,2]    [,3]   
#[1,] "page1" "page3" "page4"
#[2,] "page2" "page4" "page1"
Sotos
  • 51,121
  • 6
  • 32
  • 66