1

I have a dataframe of time reports. The relevant columns are (with example data):

  • Person: A, B, C, ... (character type)
  • Hours: 2.5, 1, 6, ... (numeric type)
  • yearMonth: 2016-02, 2017-11, 2014-09, ... (character type)

I use this to do a bar plot on all the data:

ggplot(data = time_reports, aes(x=time_reports$yearMonth, y=time_reports$Hours)) +
geom_col()

The plot is plausible, based on a standard workweek and the number of employees on the team who are reporting for that month (the team grew during this time, and several employees did not start reporting time for a few months after joining the team): bar chart showing sum of time-tracked hours per month

The x-axis label is time_reports$yearMonth and goes from mid-2014 to the end of 2017. The y-axis label is time_reports$Hours and is measured in hours. Each bar is the number of hours reported per month.

Now let me add a facet. Here's the new code, with facet_wrap added:

ggplot(data =time_reports, aes(x=time_reports$yearMonth, y=time_reports$Hours)) +
geom_col() +
facet_wrap(~Person)

I get 8 facets, which is expected since the team has 8 members. However, all of the facets has bogus data. For example, here is one of the facets: facet showing monthly hours for one employee

This employee didn't even join the team until mid-2015 and didn't start time tracking until early 2016. Additionally, this employee has been diligent about time tracking since starting the practice. What you should see are fairly level bars starting about halfway through this plot. (Sorry, this facet is in the middle of the facets, so the X and Y scales aren't adjacent to it. X scale is same as prior plot on this page, and Y is the same as the next plot on this page.)

I exported the dataframe to a CSV. I used Excel's filtering and PivotTable features to confirm that the underlying data is rational and that what ggplot2 is showing is not present in the data.

Here is another facet: facet showing monthly hours for one employee

This one is beyond insane. For this person to have reported 800 hours in a month, that would mean more than 24 hours worked per day! Again, Excel proves that the underlying data does not show any work month remotely similar to this. The most this guy ever reported in a month was 176.75 hours.

Why is ggplot2's facet_wrap function warping the data so badly?

Aren Cambre
  • 6,540
  • 9
  • 30
  • 36
  • 6
    I think its difficult to help you without an example of your data. Can you reproduce your difficulties with a MWE ? – Peter H. Jan 05 '18 at 20:47
  • 3
    Did you try changing to `ggplot(data = Web_team_time_reports, aes(x=yearMonth, y=Hours))` for your `Web_team_time_reports`? After that change I see that no one had > ~210 hours in a given `yearMonth` – Mike H. Jan 05 '18 at 21:30
  • 2
    I think you posted your full data. Please post an MWE (the **M** stands for **minimal**). Surely the problem can be illusrated with less than 10,000 rows of data? Most problems can be illustrated with 10 rows of data. As [shown in this R-FAQ for making nice examples](https://stackoverflow.com/q/5963269/903061), `dput()` can be used to easily make a copy/pasteable data structure. `dput(droplevels(head(your_data, 10)))` usually works very well. – Gregor Thomas Jan 05 '18 at 21:30
  • 1
    You're right that the `aes` help doesn't specify bare variable names - it probably should - but all the examples use bare variable names and indeed it is required for faceting. – Gregor Thomas Jan 05 '18 at 21:34
  • 3
    And when your code is 8 lines long, why post it in a suspicious file that must be downloaded instead of putting it in your question? – Gregor Thomas Jan 05 '18 at 21:35
  • 3
    This may be a canonical Q&A for "the `$` issue": [Behavior ggplot2 aes() in combination with facet_grid() when passing variable with dollar sign notation to aes()](https://stackoverflow.com/questions/32543340/behavior-ggplot2-aes-in-combination-with-facet-grid-when-passing-variable-wi) – Henrik Jan 05 '18 at 22:04

0 Answers0