Group by function query

Question

Hi guys i am new to R,

While i have attached screenshot of the df i am working with (https://i.stack.imgur.com/CUz4l.png), here is a short description

I have a data frame with a total of 7 columns, one of which is a month column, rest of the 6 columns are (integer) values and these also have empty rows

Need to summarise by count of all the 6 columns and group them by month

tried the following code: group_by(Month) %>% summarise(count=n(),na.omit())

get the following error:
Error: Problem with summarise() input ..2. x argument "object" is missing, with no default i Input ..2 is na.omit(). i The error occurred in group 1: Month = "1". Run rlang::last_error() to see where the error occurred.

Can someone please assist?

[head of data][1] (https://i.stack.imgur.com/stfoG.png)

> dput(head(Dropoff))
structure(list(Start.Date = c("01-11-2019 06:07", "01-11-2019 06:07", 
"01-11-2019 06:08", "01-11-2019 06:08", "02-11-2019 06:08", "02-11-2019 06:07"
), End.Date = c("01-11-2019 06:12", "01-11-2019 09:28", "01-11-2019 10:02", 
"01-11-2019 13:05", "02-11-2019 06:13", "02-11-2019 06:16"), 
    Month = structure(c(3L, 3L, 3L, 3L, 3L, 3L), .Label = c("1", 
    "2", "11"), class = "factor"), nps = c(9L, 10L, 9L, 8L, 9L, 
    9L), effort = c(9L, 10L, 9L, 9L, 9L, 8L), knowledge = c(NA, 
    NA, 5L, NA, NA, 5L), confidence = c(5L, 5L, NA, NA, 5L, NA
    ), listening = c(NA, NA, NA, 5L, NA, NA), fcr = c(1L, 1L, 
    1L, 1L, 1L, 1L), fixing.issues = c(NA, NA, NA, NA, NA, NA
    )), row.names = c(NA, 6L), class = "data.frame")

id like the output to look something like this

Month	count of nps	count of effort
1	xxx	xxx
2	xxx	xxx
11	6	6

....so on (count)for all the variables

the following

df%>% group_by(Month) %>% summarise(count=n())

provides this output [1]: https://i.stack.imgur.com/u3nxv.png this is not what i am hoping for

Can you post an example of your data, e.g. with `dput(head(DF))`? — c0bra, Jan 11 '21 at 10:44
Please use `dput` and paste it as text, so people can use the data as example. — c0bra, Jan 11 '21 at 10:54
Please see https://stackoverflow.com/questions/49994249/example-of-using-dput — c0bra, Jan 11 '21 at 11:09
Yes it helps. Also please provide the desired outcome, given the input. What is your intention of using the `na.omit()` ? — c0bra, Jan 11 '21 at 11:14
there are NA in all other columns except the nps column, however i cannot exclude the data from the corresponding nps column for all other variables and i am assuming na.omit will exclude the na, if the run the command without excluding the NAs i do not get the count of any other column except the first (nps) column — Iqbal S, Jan 11 '21 at 11:31

score 0 · Answer 1 · answered Jan 11 '21 at 11:01

0

looks like the na.omit() causes problems in this case. Given that you want to count NA but not have them in any following sum, you might use

df[is.na(df)] = 0

and then

df %>% group_by(Month) %>% summarise(count=n())

answered Jan 11 '21 at 11:01

marvinschmitt

346
1
7

thanks marvin, however that didnt help it what that did was group by month and provided a count of the first variable – Iqbal S Jan 11 '21 at 11:11
1

What exactly is your expected output on the example data you provided above? – marvinschmitt Jan 11 '21 at 11:12
ok i will need to figure how to use the comments section to illustrate my point, any suggestions? – Iqbal S Jan 11 '21 at 11:15
added desired outcome to op – Iqbal S Jan 11 '21 at 11:29
thank you. One more question: How do you define "count"? Do you want to count how many entries are non-NA? Or a sum? – marvinschmitt Jan 11 '21 at 11:34
Not the sum but the total number of entries or total of the rows for that header, does that make sense? excluding NAs or non NAs as you said – Iqbal S Jan 11 '21 at 11:39
i meant counting total non NAs, sorry for ther confusion mate – Iqbal S Jan 11 '21 at 11:54

marvinschmitt · Answer 2 · 2021-01-11T13:02:48.813

thanks for the clarifications. The semi-manual solution

df %>% group_by(Month) %>% summarize(
  c_nps= sum(!is.na(nps)),
  c_effort= sum(!is.na(effort)),
  c_knowledge= sum(!is.na(knowledge)),
  c_confidence= sum(!is.na(confidence)),
  c_listening= sum(!is.na(listening)),
  c_fcr= sum(!is.na(fcr))
)

should do the trick. Since it's only 6 columns to be summarized, I would use the manual specification over an automated implementation (i.e. count non-NA in all other columns).

It results in

# A tibble: 1 x 7
  Month c_nps c_effort c_knowledge c_confidence c_listening c_fcr
  <fct> <int>    <int>       <int>        <int>       <int> <int>
1 11        6        6           2            3           1     6

Cheers and good luck!

Thanks marvin, i will use this one too – Iqbal S Jan 11 '21 at 12:59 — Iqbal S, Jan 11 '21 at 12:59

c0bra · Answer 3 · 2021-01-11T15:20:38.520

0

From you example I understand, that you want to count the non-NA values in every column.

Dropoff %>% group_by(Month) %>%
summarise_at(vars(nps:fixing.issues), list(count=~sum(!is.na(.x))))

summarize_at: The term performs a summarize at every column given in the vars() expression. Here I chose all columns from nps to fixing.issues.
As summarizing function (which describes how the data is summarized), I defined to count all non-NA values. The syntax is to give all functions as named list. Here the ~ does the same as function(x). A more lengthy way to write it would be: function(x) sum(!is.na(x))
The "count" expression works as follows: check the vector of the column (x) if those are NA values is.na. The ! negates this expression. As this is a vector with only true/false values, you can just count the true values with sum.
The expression works for all kind of column types (text, numbers, ...)

Giving the result:

# A tibble: 1 x 8
  Month nps_count effort_count knowledge_count confidence_count listening_count fcr_count fixing.issues_count
  <fct>     <int>        <int>           <int>            <int>           <int>     <int>               <int>
1 11            6            6               2                3               1         6                   0

If that is not what you are aiming at, please precise your question.

edited Jan 11 '21 at 15:20

answered Jan 11 '21 at 12:45

c0bra

1,031
5
22

Thats perfect mate!, Thank you so much – Iqbal S Jan 11 '21 at 13:00
if i am not asking for much, could you please break down how this code works, just so i am clear? – Iqbal S Jan 11 '21 at 13:02
will we need to modify the code in anyway if we have a character variable column and need a count for it just the same? (text feedback from clients) – Iqbal S Jan 11 '21 at 14:36
@IqbalS I have added a bit more of explanation. – c0bra Jan 11 '21 at 15:21
that really helps! thanks for the detailed explanation – Iqbal S Jan 12 '21 at 10:51
@IqbalS Please mark this answer as accepted, if it gave you your solution to the problem. Thx – c0bra Feb 15 '21 at 10:33

Group by function query

3 Answers3