0

So I don't think this has been asked before, but SO search might just be getting confused by combinations of 'ratio' and 'faceting'. I'm trying to calculate a productivity ratio; number of widgets produced for number of workers on a given day or period. I've got my data structured in a single data frame, with each widget produced each day by each worker in it's own record, and other workers that worked that day but didn't produce a widget also in their own record, along with various metadata.

Something like this:

widget_ind employee_active_ind employee_id day product_type employee_bu
1 1 123 6/1/2021 pc americas
0 1 234 6/1/2021 mac emea
0 1 345 6/1/2021 mac apac
1 1 444 6/1/2021 mac americas
1 1 333 6/1/2021 pc emea
0 1 356 6/1/2021 pc americas

I'm trying to find the ratio of widget_inds to employee_active_inds, over time, while retaining the metadata, so that i can filter or facet within the ggplot2 code, something like:

plot <- ggplot(data = df[df$employee_bu == 'americas',],aes(y = (widget_ind/employee_active_ind), x = day)) +
  geom_bar(stat = 'identity', position = 'stack') +
  facet_wrap(product_type ~ ., scales = 'fixed') +  #change these to look at different cuts of metadata

print(plot)

Retaining the metadata is appealing rather than making individual dataframes summarizing by the various combinations, but the results with no faceting aren't even correct (e.g. the ggplot is showing a barchart with a height of ~18 widgets per person; creating a summarized dataframe with no faceting is showing a ratio of less than 1 widget per person).

I'm currently getting this error when I run the ggplot code:

Warning message:
Removed 9865 rows containing missing values (geom_bar). 

Which doesn't make sense since in my data frame both widget_ind and employee_active_ind have no NA values, so calculating the ratio of the two should always work?

Edit 1: Clarifying employee_active_ind: I should not have any employee_active_ind = 0, but my current joins produce them (and it passes the reality sniff test; the process we are trying to model allows you to do work on day 1 that results in a widget on day 2, where you may not do any work, so wouldn't be counted as active on that day). I think I need to re-think my data structure. Even so, I'm assuming here that ggplot2 is acting like it would for a given bar chart; it's taking the number in each widget_ind record, for a given day (along with any facets and filters), and is then summing that set and displaying the result. The wrinkle I'm adding is dividing by the number of active employees on that day, and while you can have some one out on a given day, you'd never have everyone out. But that isn't what ggplot is doing is it?

jhunter
  • 9
  • 2
  • 1
    You should just use ggplot for plotting, not calculating values since you don't have that much control. Please share your data in a [reproducible format](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) like a `dput()` so we can copy/paste into R for testing. What values would you expect to plot for the sample data? Do you have values of `employee_active_ind` that are 0? – MrFlick Jun 18 '21 at 21:46

1 Answers1

0

I agree with MrFlick - especially the question concerning employee_active_ind of 0. If you have them, this could create NA values where something is divided by 0.