0

I have data like this:

> head(df)
                  Date IsWin
20 2014-07-13 00:00:00  True
21 2014-08-01 00:00:00  True
22 2014-08-05 00:00:00 False
23 2014-06-28 00:00:00  True
24 2014-05-31 00:00:00  True
25 2014-06-06 00:00:00  True

I would like to group by Date and sum by IsWin (which should be a factor of 1 or -1).

I have read through this but it doesn't really deal with factors, so I don't know how to apply it How to group a data.frame by date?

Ultimately, I would like to pass the grouped and summed data to a bar chart to show the number of wins or losses, something like ggplot2 and a Stacked Bar Chart with Negative Values

The following outputs a table which is quite helpful to seeing what I want; however, I would like to translate this into a bar chart for better visuals:

> table(df[,1],df[,2])

                      False True
  2014-05-25 00:00:00     1    0
  2014-05-29 00:00:00     1    0
  2014-05-30 00:00:00     2    0
  2014-05-31 00:00:00     0    1
  2014-06-06 00:00:00     0    1
  2014-06-13 00:00:00     1    0
  2014-06-14 00:00:00     0    1
  2014-06-18 00:00:00     1    0
  2014-06-19 00:00:00     0    1
  2014-06-23 00:00:00     1    0
  2014-06-24 00:00:00     1    0
  2014-06-25 00:00:00     1    0
  2014-06-27 00:00:00     0    1
  2014-06-28 00:00:00     1    2
  2014-07-02 00:00:00     1    0
  2014-07-11 00:00:00     1    0
  2014-07-13 00:00:00     0    2
  2014-07-31 00:00:00     0    1
  2014-08-01 00:00:00     0    1
  2014-08-05 00:00:00     1    0
  2014-08-07 00:00:00     1    0
  2014-08-12 00:00:00     0    1

Here is my actual structure:

df <- structure(list(Date = c("2014-07-13 00:00:00", "2014-08-01 00:00:00", 
"2014-08-05 00:00:00", "2014-06-28 00:00:00", "2014-05-31 00:00:00", 
"2014-06-06 00:00:00", "2014-06-14 00:00:00", "2014-05-25 00:00:00", 
"2014-06-24 00:00:00", "2014-06-28 00:00:00", "2014-05-30 00:00:00", 
"2014-06-18 00:00:00", "2014-07-02 00:00:00", "2014-07-11 00:00:00", 
"2014-05-29 00:00:00", "2014-06-19 00:00:00", "2014-07-31 00:00:00", 
"2014-06-27 00:00:00", "2014-06-23 00:00:00", "2014-05-30 00:00:00", 
"2014-07-13 00:00:00", "2014-08-12 00:00:00", "2014-06-13 00:00:00", 
"2014-06-25 00:00:00", "2014-06-28 00:00:00", "2014-08-07 00:00:00"
), IsWin = structure(c(2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L
), .Label = c("False", "True"), class = "factor")), .Names = c("Date", 
"IsWin"), row.names = 20:45, class = "data.frame")
Community
  • 1
  • 1
user1477388
  • 20,790
  • 32
  • 144
  • 264

2 Answers2

1

Try:

ddf2 = data.frame(with(df, table(Date, IsWin)))

ggplot(ddf2)+
    geom_bar(aes(x=Date, y=Freq, fill=IsWin), stat='identity', position='dodge')+
    theme(axis.text.x=element_text(angle=45, size=10, hjust=1, vjust=1))

enter image description here

EDIT: For negative bars:

ddf2$new = ifelse(ddf2$IsWin=='True', 1,-1)

ggplot(ddf2)+
    geom_bar(data=ddf2[ddf2$new>0,], aes(x=Date, y=Freq*new, fill=IsWin), stat='identity')+
    geom_bar(data=ddf2[ddf2$new<0,], aes(x=Date, y=Freq*new, fill=IsWin), stat='identity')+
    theme(axis.text.x=element_text(angle=45, size=10, hjust=1, vjust=1))

enter image description here

rnso
  • 23,686
  • 25
  • 112
  • 234
  • Thanks, but the output is different from what I was expecting. For instance, look at 2014-06-28. I would like for the red to go to negative one, and the blue to go to 2 since the score for that day was 2 wins and one loss. Basically, I would like the losses to be multiplied by negative one, so that they display more obviously as losses (sub zero values). Also, I added `df$Date <- as.Date(df$Date)` so that the dates display as dates. Can you tell me how to multiply the losses by negative 1? – user1477388 Aug 18 '14 at 13:05
  • Note: I have tried `df$IsWin <- factor(df$IsWin, levels=c(-1,1))` but it creates an empty chart. – user1477388 Aug 18 '14 at 13:45
  • 1
    Please see my EDIT in the answer above. – rnso Aug 18 '14 at 15:32
1

How about this? You use group_by() in the package. You can group your data in the following way. You can summarise (count) how many TRUE and FALSE exist for each date. Using this data frame, you can create a stacked bar chart.

library(dplyr)
library(ggplot2)

### Create a sample data set
dates <- rep(c("2014-08-01", "2014-08-02"), each = 10, times = 1)
win <- rep(c("TRUE", "FALSE", "FALSE", "TRUE", "TRUE"), each = 1, times = 4)

foo <- data.frame(cbind(dates, win))
foo$dates <- as.character(foo$dates)

ana <- foo %>%
         group_by(dates, win) %>%
         summarize(count = n())

# ana
# Source: local data frame [4 x 3]
# Groups: date

#        dates   win count
# 1 2014-08-01 FALSE     4
# 2 2014-08-01  TRUE     6
# 3 2014-08-02 FALSE     4
# 4 2014-08-02  TRUE     6

bob <- ggplot(ana, aes(x=dates, y=count, fill=win)) +
         geom_bar(stat="identity") +
         scale_y_continuous(breaks = seq(0,10,by = 1))

UPDATED OPTION

After seeing comments, I came up with this idea. It has two new things. One is to convert positive values to negative ones when the win condition is FALSE. The other is a new ggplot. I am sure there are better ways of doing things. But, I would like to contribute this idea here.

ana <- foo %>%
    group_by(dates, win) %>%
    summarize(count = n())

# If there is FALSE in ith row in the win column, make the value of ith row in the
# count column negative. If you can avoid a loop and achieve the same goal, that
# may be the best option. But, I do not have any ideas in my mind yet.

for(i in 1:nrow(ana)){

    if(ana$win[[i]] == "FALSE"){

    ana$count[[i]] <- -abs(ana$count[[i]])

    }
}

bob <- ggplot(data=ana, aes(x=dates, y=count, fill=win)) +
       geom_bar(stat="identity", position=position_dodge())

Does this fulfil your requirements?

jazzurro
  • 23,179
  • 35
  • 66
  • 76
  • Thanks; but it seems like the dates overlap on the x axis. – user1477388 Aug 18 '14 at 12:59
  • I set up dates as date. But that screwed up the x axis. I change dates to character here. On my machine, I do not have the issue any more. Hope this will do for you. – jazzurro Aug 18 '14 at 13:11
  • Thanks for the update. Do you know how to multiply the false wins by negative one? I would like for the losses to show as negative values and the wins to show as positive values to make my visualization more comprehensive. – user1477388 Aug 18 '14 at 13:18
  • 1
    I just updated my suggestion. Does this work for you? – jazzurro Aug 18 '14 at 15:13
  • Thanks for updating and commenting the code. The other answer uses a little simpler syntax `ddf2$new = ifelse(ddf2$IsWin=='True', 1,-1)` so will go with that one; but thanks so much :) – user1477388 Aug 18 '14 at 16:42
  • 1
    All good. If you just need to draw a figure, there is no need to use dplyr here. The other approach will do. – jazzurro Aug 19 '14 at 01:58