2

I have a bunch of data in a table (imported from csv) in the following format:

date        classes         score
9/1/11       french          34
9/1/11       english         34
9/1/11       french          34
9/1/11       spanish         34
9/2/11       french          34
9/2/11       english         34
9/3/11       spanish         34
9/3/11       spanish         34
9/5/11       spanish         34
9/5/11       english         34
9/5/11       french          34
9/5/11       english         34

Ignore the score column, it's not important.

I need a tally of the total number of students taking English or Spanish or french class based on date, ie. I need to first group it by date and then divide each day into further blocks based on language and plot it as a stacked bar chart so it looks like the following. Each bar represents a date and each cross section of a bar represents a single language.

I've figured out how to do this once I get the data in a matrix form where each row represents a date and every column an attribute (or language). So I assuming the data is in that form in a csv:

ie           french      english       spanish
9/1/11       2           1             1
9/2/11       1           1             0          
9/3/11       0           0             2
9/5/11       1           2             1

then I can do:

directory<-"C:\\test\\language.csv"
ourdata6<-read.csv(directory)

language<-as.matrix(ourdata6)

barchart(prop.table(language), horizontal=FALSE, auto.key = list(space='right',cex=.5,border=T,points=F, lines=F,lwd=5,text=c('french','spanish','enligsh'),cex=.6), main = list(label="Distribution of classes 10",cex=2.5),  ylab = list(", cex=1.7),xlab.top=list("testing",cex=1.2))

The challenge is to get the data from the original format into the format I need.

I tried

a<-count(language, c("date", "classes"))

where it gives me the counts sorted by both but its in a vertical form

ie
9/1/11       french           2             
9/1/11       english          1                       
9/1/11       spanish          1            
etc...

I need to pivot this so it becomes a single row per date. Also if some of these might be zero so I need placeholders for them ie. the first column must correspond to french, the second must correspond to english for my current setup to work.

Any ideas on how to do this or if my approach with matrix + prop.table is even correct? Are there any simpler ways of doing this?

Jaap
  • 81,064
  • 34
  • 182
  • 193
curfewed
  • 79
  • 1
  • 6

2 Answers2

5

Supposing your data is in a dataframe called df, you can do that with the help of the dplyr and tidyr packages:

library(dplyr)
library(tidyr)

wide <- df %>% select(date,classes) %>%
  group_by(date,classes) %>%
  summarise(n=n()) %>%            # as @akrun said, you can also use tally()
  spread(classes, n, fill=0)

Using the example data you provided, this results in the following dataframe:

  date english french spanish
9/1/11       1      2       1
9/2/11       1      1       0
9/3/11       0      0       2
9/5/11       2      1       1

Now you can make a lattice plot with:

barchart(date ~ english + french + spanish, data=wide, stack = TRUE,
         main = list(label="Distribution of language classes",cex=1.6),
         xlab = list("Number of classes", cex=1.1),
         ylab = list("Date", cex=1.1),
         auto.key = list(space='right',cex=1.2,text=c('Enligsh','French','Spanish')))

which gives the following plot: enter image description here


EDIT: Instead of using lattice-plots, you can also use ggplot2, which is (at least in my opinion) easier to understand. An example:

# convert the wide dataframe to a long one
long <- wide %>% gather(class, n, -date)

# load ggplot2
library(ggplot2)

# create the plot
ggplot(long, aes(date, n, fill=class)) +
  geom_bar(stat="identity", position="stack") +
  coord_flip() +
  theme_bw() +
  theme(axis.title=element_blank(), axis.text=element_text(size=12))

which gives: enter image description here

Jaap
  • 81,064
  • 34
  • 182
  • 193
  • 1
    +1, or `df %>% group_by(date, classes) %>% tally() %>% spread(classes, n, fill=0)` – akrun Sep 19 '14 at 14:56
  • 1
    @akrun Thanx for pointing to the `tally` function. Learned again something new today :-) – Jaap Sep 19 '14 at 14:59
  • Hi thanks, where is table name in that command – curfewed Sep 19 '14 at 16:19
  • @curfewed in which command? Without specifying, it's hard to answer that for me ... – Jaap Sep 19 '14 at 16:22
  • HI jaap in reality I have tons of columns so specifying spanish + french + english +.. is not efficient. That's why I tried the prop.table approach. Your wide is working perfectly, now prop.table(wide) doesn't work unless wide is a matrix, I did wide2<-as.matrix(wide) but now all the numbers have " " around them. any idea? – curfewed Sep 19 '14 at 16:59
  • so after wide, I did barchart(prop.table(wide), horizontal=FALSE, auto.key = list(space='right',cex=.5,border=T,points=F, lines=F,lwd=5,text=c('french','spanish','enligsh'),cex=.6), main = list(label="Distribution of classes 10",cex=2.5), ylab = list(", cex=1.7),xlab.top=list("testing",cex=1.2)) and it's complaing Error in FUN(X[[1L]], ...) : only defined on a data frame with all numeric variables – curfewed Sep 19 '14 at 17:01
  • @curfewed in that case it's probably better/easier to use `ggplot2`, I added an example to my answer – Jaap Sep 19 '14 at 18:00
  • Hi jaap it is otherwise perfect except for the date issue, order of the date right now is by month end, so first row is 9/1/11, second row is 9/1/2012, third is 9/1/2013, then 10/1/11, 10/1/2012, 10/1/2013, ... I need 9/1/2011, 9/2/2011, 9/3/2011..sequentially. any idea – curfewed Sep 22 '14 at 13:24
  • @curfewed as you have an answer to your [other question](http://stackoverflow.com/questions/25975783/reshaped-date-from-month-end-to-chronological) now, is this also solved? – Jaap Sep 22 '14 at 17:13
4

I hope I'm not missing anything, but it looks to me like you're just looking for table:

table(df[c("date", "classes")])
#         classes
# date     english french spanish
#   9/1/11       1      2       1
#   9/2/11       1      1       0
#   9/3/11       0      0       2
#   9/5/11       2      1       1

The result is a table (which is also a matrix) so you can use your barchart command as you wish.

Here's what I get--looks like you need to work on your legend :-)

enter image description here

The code used was:

language <- table(df[c("date", "classes")])

barchart(prop.table(language), 
         horizontal = FALSE, 
         auto.key = list(space = 'right',
                         cex = .5, border = T, points = F, 
                         lines = F, lwd = 5, 
                         text = c('french','spanish','enligsh'),
                         cex = .6), 
         main = list(label = "Distribution of classes 10", cex = 2.5),
         ylab = list("", cex = 1.7), 
         xlab.top = list("testing", cex = 1.2))
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
  • thanks this works but the only problem is the order of the date right now it's order by month end, so first row is 9/1/11, second row is 9/1/2012, third is 9/1/2013 etc – curfewed Sep 19 '14 at 16:39
  • 1
    @curfewed, well, it would help if you used actual dates instead of strings, or if you were using strings, make them ordered factors. Those are irrelevant to your question, which was about how to reshape your data for use with lattice. – A5C1D2H2I1M1N2O1R2T1 Sep 19 '14 at 16:46