0

Currently need some help with the below dataframe (which is also the same format in Excel so this could be done in Excel or R)

Dataframe:

Company_id    Year    Month    Employee_Range   Employees Cheese Chips Eggs
1             2014    NA       NA               NA         1       0    0   
1             2014    NA       NA               NA         1       0    0  
1             2014    NA       NA               NA         1       0    0 
2             2014    NA       NA               NA         0       1    0
3             2014    NA       NA               NA         0       0    1 
3             2014    NA       NA               NA         0       0    1

The dataframe continues on for about 630,000 rows, here is some further information

1) for the column company_id, each company is numbered so 1 = company 1, 2 = company 2 and so on. Each company is repeated based on if they received Chips, Eggs or Cheese more than once, which is why company 2 only has one row.

2) the numbers under the columns, cheese, chips and eggs just means 1 = "yes they ordered it" and 0 = "no they did not order it", so it works like a tally table but each company is a row

3) the rest of the information is NA as it is not needed

4) if a company chose one of eggs, cheese or chips then it is just that column only! There are no cases or occurences where a company chose more than one item, so all numbers are contained to a single column for that company.

I would like a way to find the sum of the count of a given company row, so i would like to produce a dataframe or excel table such as :

   Company_id    Year    Month    Employee_Range   Employees Cheese Chips Eggs
        1          2014    NA       NA               NA         3       0    0
        2          2014    NA       NA               NA         0       1    0
        3          2014    NA       NA               NA         0       0   2

Any ideas are helpful,

Thank you,

Data Science
  • 79
  • 2
  • 12
  • 1
    Possible duplicate of [Aggregate / summarize multiple variables per group (e.g. sum, mean)](https://stackoverflow.com/questions/9723208/aggregate-summarize-multiple-variables-per-group-e-g-sum-mean) – denis May 24 '18 at 15:56

2 Answers2

0

A solution using dplyr. dat2 is the final output.

library(dplyr)

dat2 <- dat %>%
  group_by(Company_id, Year, Month, Employee_Range, Employees) %>%
  summarise_at(vars(Cheese, Chips, Eggs), funs(sum(.))) %>%
  ungroup()
dat2
# # A tibble: 3 x 8
#   Company_id  Year Month Employee_Range Employees Cheese Chips  Eggs
#        <int> <int> <lgl> <lgl>          <lgl>      <int> <int> <int>
# 1          1  2014 NA    NA             NA             3     0     0
# 2          2  2014 NA    NA             NA             0     1     0
# 3          3  2014 NA    NA             NA             0     0     2

DATA

dat <- read.table(text = "Company_id    Year    Month    Employee_Range   Employees Cheese Chips Eggs
1             2014    NA       NA               NA         1       0    0   
1             2014    NA       NA               NA         1       0    0  
1             2014    NA       NA               NA         1       0    0 
2             2014    NA       NA               NA         0       1    0
3             2014    NA       NA               NA         0       0    1 
3             2014    NA       NA               NA         0       0    1",
                  header = TRUE)
www
  • 38,575
  • 12
  • 48
  • 84
0

Try this:

library(dplyr)

df %>% 
  group_by(Company_id, Year, Month, Employee_Range) %>% 
  summarize(Cheese = sum(Cheese), 
            Chips = sum(Chips),
            Eggs = sum(Eggs)) %>%
as.data.frame()

The result as you wished:

  Company_id Year Month Employee_Range Cheese Chips Eggs
1          1 2014    NA             NA      3     0    0
2          2 2014    NA             NA      0     1    0
3          3 2014    NA             NA      0     0    2
J. Win.
  • 6,662
  • 7
  • 34
  • 52