Specs: R 3.2.4, Windows 7 Enterprise SP1 (32-bit)
I'm trying to do a boxplot on a subset of a data frame, grouped by a particular level. However, I'm obviously doing something wrong, because it's plotting for all the levels in the original frame, not the subset.
We have an online banking platform that does "real time" communications with about 500 client institutions, and we're seeing some slow response times for some clients. I'm trying to use R to visualize the data in different ways to look for a pattern.
My data frame is a 1-hour snapshot of message response times across all institutions during a particularly busy morning. This snapshot is generated from a database query and saved to a .csv file on the file system:
rt=read.csv("\\path\\to\\csv\\file",header=TRUE)
The structure of the data frame is message sequence #, network id, institution id, date, message class, and elapsed time for the message. Network id refers to the specific communications interface (we have about 28-30 active interfaces).
I've created a subset of that snapshot by picking institutions that belong to a particular network:
rt.network=subset(rt,rt$Network==41)
At this point, rt.network
should only contain observations for 4 institutions:
levels(factor(rt.network$Institution))
[1] "INST1" "INST2" "INST3" "INST4"
So far so good. Now I want to see a box plot of the elapsed times for each of those institutions, so I do the following:
boxplot(Elapsed~Institution,data=rt.network,outline=FALSE)
I expect to results for only those institutions in the subset frame; however, R is plotting results for all ~500 institutions, where all but 4 are empty and those 4 are uselessly skinny (don't have an easy way to share the image, sorry; just imagine a box plot where the X axis has 500 entries, and 4 one-to-two-pixel wide boxes).
The Question - why is R generating plots for institutions not contained within the subset data frame? What have I done wrong in the boxplot or subset commands?
Needless to say, I'm confused; I don't understand why those empty levels are showing up in the plot at all.
If necessary, I can filter the results I want from the database and reload; I just thought it would be nice to load all the data once, and do the filtering/subsetting within R.