0

To create some plots, I've already summarized my data using the following approach, which includes all the needed information.

# Load Data
RawDataSet <- read.csv("http://pastebin.com/raw/VP6cF31A", sep=";")
# Load packages
library(plyr)
library(dplyr)
library(tidyr)
library(ggplot2)
library(reshape2)

# summarising the data
new.df <- RawDataSet %>% 
  group_by(UserEmail,location,context) %>% 
  tally() %>%
  mutate(n2 = n * c(1,-1)[(location=="NOT_WITHIN")+1L]) %>%
  group_by(UserEmail,location) %>%
  mutate(p = c(1,-1)[(location=="NOT_WITHIN")+1L] * n/sum(n))

With some other analysis I've identified distinct user groups. Since I would like to plot my data, it would be great to have a plot visualizing my data in the right order. The order is based on the UserEmail and is defined by the following:

order <- c("28","27","25","23","22","21","20","16","12","10","9","8","5","4","2","1","29","19","17","15","14","13","7","3","30","26","24","18","11","6")

Asking for the type of my new.df with typeof(new.df) it says that this is a list. I've already tried some approaches like order_by or with_order, but I until now I have not managed it to order my new.df depending on my order-vector. Of course, the order process could also be done in the summarising part. Is there any way to do so?

schlomm
  • 551
  • 2
  • 11
  • 22
  • 1
    Just `dplyr::arrange`. `typeof` a data.frame is a list (which it technically is); `class` tells you if it's actually a `data.frame`. – alistaire Feb 10 '16 at 18:13

1 Answers1

2

I couldn't bring myself to create a vector named order out of respect for the R function by that name. Using match to construct an index to use as the basis ordering (as a function):

sorted.df <- new.df[ order(match(new.df$UserEmail, as.integer(c("28","27","25","23","22","21","20","16","12","10","9","8","5","4","2","1","29","19","17","15","14","13","7","3","30","26","24","18","11","6")) )), ]
 head(sorted.df)
#---------------
Source: local data frame [6 x 6]
Groups: UserEmail, location [4]

  UserEmail   location   context     n    n2          p
      (int)     (fctr)    (fctr) (int) (dbl)      (dbl)
1        28 NOT_WITHIN Clicked A    16   -16 -0.8421053
2        28 NOT_WITHIN Clicked B     3    -3 -0.1578947
3        28     WITHIN Clicked A     2     2  1.0000000
4        27 NOT_WITHIN Clicked A     4    -4 -0.8000000
5        27 NOT_WITHIN Clicked B     1    -1 -0.2000000
6        27     WITHIN Clicked A     1     1  1.0000000

(I didn't load plyr or reshape2 since at least one of those packages has a nasty habit of interaction poorly with the dplyr functions.)

IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • Thanks :) Worked like a charm. Unfortunately I've got into another problem, which somehow relates to this question, but which is in regard to ggplot.... http://stackoverflow.com/questions/35324848/reorder-data-in-ggplot-after-successfully-reorder-underlying-data – schlomm Feb 10 '16 at 20:08