-1

Hi guys i am new to R,

While i have attached screenshot of the df i am working with (https://i.stack.imgur.com/CUz4l.png), here is a short description

I have a data frame with a total of 7 columns, one of which is a month column, rest of the 6 columns are (integer) values and these also have empty rows

Need to summarise by count of all the 6 columns and group them by month

tried the following code: group_by(Month) %>% summarise(count=n(),na.omit())

get the following error:
Error: Problem with summarise() input ..2. x argument "object" is missing, with no default i Input ..2 is na.omit(). i The error occurred in group 1: Month = "1". Run rlang::last_error() to see where the error occurred.

Can someone please assist?

[head of data][1] (https://i.stack.imgur.com/stfoG.png)

> dput(head(Dropoff))
structure(list(Start.Date = c("01-11-2019 06:07", "01-11-2019 06:07", 
"01-11-2019 06:08", "01-11-2019 06:08", "02-11-2019 06:08", "02-11-2019 06:07"
), End.Date = c("01-11-2019 06:12", "01-11-2019 09:28", "01-11-2019 10:02", 
"01-11-2019 13:05", "02-11-2019 06:13", "02-11-2019 06:16"), 
    Month = structure(c(3L, 3L, 3L, 3L, 3L, 3L), .Label = c("1", 
    "2", "11"), class = "factor"), nps = c(9L, 10L, 9L, 8L, 9L, 
    9L), effort = c(9L, 10L, 9L, 9L, 9L, 8L), knowledge = c(NA, 
    NA, 5L, NA, NA, 5L), confidence = c(5L, 5L, NA, NA, 5L, NA
    ), listening = c(NA, NA, NA, 5L, NA, NA), fcr = c(1L, 1L, 
    1L, 1L, 1L, 1L), fixing.issues = c(NA, NA, NA, NA, NA, NA
    )), row.names = c(NA, 6L), class = "data.frame")

id like the output to look something like this

Month count of nps count of effort
1 xxx xxx
2 xxx xxx
11 6 6

....so on (count)for all the variables

the following

df%>% group_by(Month) %>% summarise(count=n())

provides this output [1]: https://i.stack.imgur.com/u3nxv.png this is not what i am hoping for

Iqbal S
  • 1
  • 2
  • Can you post an example of your data, e.g. with `dput(head(DF))`? – c0bra Jan 11 '21 at 10:44
  • head of data added – Iqbal S Jan 11 '21 at 10:52
  • Please use `dput` and paste it as text, so people can use the data as example. – c0bra Jan 11 '21 at 10:54
  • i am not sure how exactly that is done mate, apologies – Iqbal S Jan 11 '21 at 10:57
  • Please see https://stackoverflow.com/questions/49994249/example-of-using-dput – c0bra Jan 11 '21 at 11:09
  • Yes it helps. Also please provide the desired outcome, given the input. What is your intention of using the `na.omit()` ? – c0bra Jan 11 '21 at 11:14
  • Please check the format of the desired output – Iqbal S Jan 11 '21 at 11:25
  • there are NA in all other columns except the nps column, however i cannot exclude the data from the corresponding nps column for all other variables and i am assuming na.omit will exclude the na, if the run the command without excluding the NAs i do not get the count of any other column except the first (nps) column – Iqbal S Jan 11 '21 at 11:31

3 Answers3

0

looks like the na.omit() causes problems in this case. Given that you want to count NA but not have them in any following sum, you might use

df[is.na(df)] = 0

and then

df %>% group_by(Month) %>% summarise(count=n())
marvinschmitt
  • 346
  • 1
  • 7
0

thanks for the clarifications. The semi-manual solution

df %>% group_by(Month) %>% summarize(
  c_nps= sum(!is.na(nps)),
  c_effort= sum(!is.na(effort)),
  c_knowledge= sum(!is.na(knowledge)),
  c_confidence= sum(!is.na(confidence)),
  c_listening= sum(!is.na(listening)),
  c_fcr= sum(!is.na(fcr))
)

should do the trick. Since it's only 6 columns to be summarized, I would use the manual specification over an automated implementation (i.e. count non-NA in all other columns).

It results in

# A tibble: 1 x 7
  Month c_nps c_effort c_knowledge c_confidence c_listening c_fcr
  <fct> <int>    <int>       <int>        <int>       <int> <int>
1 11        6        6           2            3           1     6

Cheers and good luck!

marvinschmitt
  • 346
  • 1
  • 7
0

From you example I understand, that you want to count the non-NA values in every column.

Dropoff %>% group_by(Month) %>%
summarise_at(vars(nps:fixing.issues), list(count=~sum(!is.na(.x))))
  • summarize_at: The term performs a summarize at every column given in the vars() expression. Here I chose all columns from nps to fixing.issues.
  • As summarizing function (which describes how the data is summarized), I defined to count all non-NA values. The syntax is to give all functions as named list. Here the ~ does the same as function(x). A more lengthy way to write it would be: function(x) sum(!is.na(x))
  • The "count" expression works as follows: check the vector of the column (x) if those are NA values is.na. The ! negates this expression. As this is a vector with only true/false values, you can just count the true values with sum.
  • The expression works for all kind of column types (text, numbers, ...)

Giving the result:

# A tibble: 1 x 8
  Month nps_count effort_count knowledge_count confidence_count listening_count fcr_count fixing.issues_count
  <fct>     <int>        <int>           <int>            <int>           <int>     <int>               <int>
1 11            6            6               2                3               1         6                   0

If that is not what you are aiming at, please precise your question.

c0bra
  • 1,031
  • 5
  • 22