How can I summarise rows and mean for a specific column

Question

How can I sum the rows of my dataframe being grouped by a specific column, I already tried with group_by()rowSums() but it gives me an error in wanting to group this column in the specific one that I want. The purpose of this is that I want to show the percentage by Localidad of all other columns. Using dput(mydataframe) it gives me this:


 No.    Mes             Delegacion              Localidad Esquema Número
1    1 oct-10     09CIUDAD DE MÉXICO    09016MIGUEL HIDALGO       U    629
2    2 oct-10     09CIUDAD DE MÉXICO 09005GUSTAVO A. MADERO       U   1402
3    3 oct-10     09CIUDAD DE MÉXICO 09005GUSTAVO A. MADERO       U    699
4    4 oct-10     09CIUDAD DE MÉXICO          09003COYOACÁN       U     48
5    5 oct-10     09CIUDAD DE MÉXICO    09010ÁLVARO OBREGÓN       U    539
6    6 oct-10     09CIUDAD DE MÉXICO           09011TLÁHUAC       U     55
7    7 oct-10     09CIUDAD DE MÉXICO           09011TLÁHUAC       U    700
8    8 oct-10     09CIUDAD DE MÉXICO        09007IZTAPALAPA       Y     23
9    9 oct-10       01AGUASCALIENTES    01001AGUASCALIENTES       U     10
10  10 oct-10       01AGUASCALIENTES    01001AGUASCALIENTES       U   1270
11  11 oct-10 02BAJA CALIFORNIA               02001ENSENADA       Y     41

                                                Nombre               X2.1 X2.2 X2.3 X2.4 X2.5 X2.6 X2.7 X2.8 X2.9
1                                              MUNDO DE LOS PEQUES    1    1    1    1    1    1    1    1    1
2                                   GUARDERIA EL ARBOL DE LA NIÑEZ    1    1    1    1    1    1    1    1    1
3                                           LOS PEQUEÑOS GENIOS II    1    1    1    1    1    0    1    1    1
4                                                   MI MUNDO FELIZ    1    1    1    1    1    1    1    1    1
5                       CENTRO ECOLÓGICO DE DESARROLLO INFANTIL II    1    1    1    1    1    1    1    1    1
6                                 ESTANCIA INFANTIL TERCER MILENIO    1    1    1    1    1    0    1    1    1
7                                                   CEDEI COKELUSH    1    1    1    1    1    0    1    1    1
8                    INSTITUTO PEDAGÓGICO DE EDUCACIÓN INCIAL S.A.    1    1    1    1    1    1    1    1    1
9               CTO. DE DESARROLLO INFANTIL GARABATILLOS, SA DE CV    1    1    1    1    1    1    1    1    1
10 CENTRO DE DESARROLLO INFANTIL EL MUNDO DE LOS NIÑOS Y LAS NIÑAS    1    1    1    1    1    1    1    1    1
11                                              CARITAS SONRIENTES    1    1    1    1    1    0    1    1    1

I want to take the average of all the rows grouping by Localidad

Welcome to Stack Overflow. Please [make this question reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) by including example data in a plain text format - for example the output from `dput(yourdata)`. We cannot copy/paste data from images. — neilfws, Nov 23 '21 at 00:44

Park · Accepted Answer · 2021-11-23T01:43:30.553

1

If you want to get average of every each variables, by using dplyr,

library(dplyr)
df %>%
  group_by(Localidad) %>%
  summarise(across(where(is.numeric), ~mean(.x)))

will helps.

Add

result <- dummy %>%
  group_by(Localidad) %>%
  select(Localidad, starts_with("X")) %>%
  summarise(across(where(is.numeric), ~ scales::percent(mean(.x))))

  Localidad   X2.1  X2.2  X2.3  X2.4  X2.5  X2.6  X2.7  X2.8  X2.9  X2.1.1 X2.11 X2.12 X3.1  X3.2  X5.1  X5.2  X5.3  X5.4  X5.5  X5.6  X5.7  X5.8  X6.1 
   <chr>       <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>  <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
 1 "01001AGUA~ 100%  100%  100%  100%  100%  100%  100%  100%  100%  100%   100%  100%  100%  100%  100%  100%  100%  100%  100%  100%  100%  100%  100% 
 2 "02001ENSE~ 100%  100%  100%  100%  100%  0%    100%  100%  100%  100%   100%  100%  100%  100%  0%    0%    100%  100%  100%  100%  100%  100%  100% 
 3 "02002MEXI~ 100%  100%  100%  100%  100%  100%  100%  100%  100%  100%   100%  100%  100%  100%  100%  100%  100%  100%  100%  100%  100%  100%  100% 
 4 "02004TIJU~ 100%  100%  100%  100%  100%  100%  100%  100%  100%  100%   100%  100%  100%  100%  100%  75%   100%  100%  100%  75%   100%  100%  100% 
 5 "03003LA P~ 100%  100%  100%  100%  100%  100%  100%  100%  100%  100%   100%  100%  100%  100%  100%  100%  100%  100%  100%  100%  100%  100%  100% 
 6 "04003CIUD~ 100%  100%  100%  100%  100%  100%  100%  100%  100%  100%   100%  100%  100%  100%  0%    100%  100%  100%  100%  100%  100%  100%  100% 
 7 "09003COYO~ 100%  100%  100%  100%  100%  100%  100%  100%  100%  100%   100%  100%  100%  100%  100%  100%  100%  100%  100%  100%  100%  100%  100% 
 8 "09005GUST~ 100%  100%  100%  100%  100%  50%   100%  100%  100%  100%   100%  100%  100%  100%  100%  100%  100%  100%  100%  100%  100%  100%  100% 
 9 "09007IZTA~ 100%  100%  100%  100%  100%  100%  100%  100%  100%  100%   100%  100%  100%  100%  100%  100%  100%  100%  100%  100%  100%  100%  100% 
10 "09010ALVA~ 100%  100%  100%  100%  100%  100%  100%  100%  100%  100%   100%  100%  100%  100%  100%  100%  100%  100%  100%  100%  100%  100%  100% 
11 "09011TLAH~ 100%  100%  100%  100%  100%  0%    100%  100%  100%  100%   100%  100%  100%  100%  100%  100%  100%  100%  100%  100%  100%  100%  100% 
12 "09016MIGU~ 100%  100%  100%  100%  100%  100%  100%  100%  100%  100%   100%  100%  100%  100%  100%  100%  100%  100%  100%  100%  100%  100%  100% 
13 "26055SAN ~ 100%  100%  100%  100%  100%  100%  100%  100%  100%  100%   100%  100%  100%  100%  100%  100%  100%  100%  100%  100%  100%  100%  100% 
# ... with 49 more variables: X6.2 <chr>, X6.3 <chr>, X6.4 <chr>, X6.5 <chr>, X7.1 <chr>, X7.2 <chr>, X7.3 <chr>, X7.4 <chr>, X8.1 <chr>, X8.2 <chr>,
#   X9.1 <chr>, X9.2 <chr>, X9.3 <chr>, X9.4 <chr>, X10.1 <chr>, X10.2 <chr>, X10.3 <chr>, X10.4 <chr>, X10.5 <chr>, X10.6 <chr>, X10.7 <chr>,
#   X10.8 <chr>, X10.9 <chr>, X11.1 <chr>, X11.2 <chr>, X11.3 <chr>, X11.4 <chr>, X11.5 <chr>, X11.6 <chr>, X11.7 <chr>, X11.8 <chr>, X11.9 <chr>,
#   X11.1.1 <chr>, X11.11 <chr>, X11.12 <chr>, X11.13 <chr>, X11.14 <chr>, X11.15 <chr>, X11.16 <chr>, X12.1 <chr>, X12.2 <chr>, X12.3 <chr>,
#   X12.4 <chr>, X12.5 <chr>, X12.6 <chr>, X12.8 <chr>, X14.1 <chr>, X14.2 <chr>, X. <chr>

edited Nov 23 '21 at 01:43

answered Nov 23 '21 at 00:44

Park

14,771
6
10
29

Hello @Park Thanks for answer, the console gives me this error: ```Error in UseMethod("group_by") : no applicable method for 'group_by' applied to an object of class "logical"``` – Juan Jesus Nov 23 '21 at 00:48
@JuanJesus Do you get same error with `dplyr::group_by'? – Park Nov 23 '21 at 00:49
Yea, it gives me the same error. In all my dataframe I have some char columns, not of all are integer, I dont know if thats the problem that Im trying to summarise everything with char columns. – Juan Jesus Nov 23 '21 at 00:52
@JuanJesus I edit the code above. It will only summarize columns with numeric values. Would you try this and tell me if it works? – Park Nov 23 '21 at 00:54
Sorry it gaves me the same error as above. – Juan Jesus Nov 23 '21 at 00:58
@JuanJesus Since you provide your data as an image, the only help that I can give is just a prediction. Try using `dput(head(yourdata,20))` and paste the result to your question. – Park Nov 23 '21 at 01:03
Done, The result its very odd and thats why I didnt paste it like a Text since the beginning, I hope this helps for helping me, thank you. – Juan Jesus Nov 23 '21 at 01:11
@JuanJesus Thanks a lot for your data. In your desired result, does variable except `Localidad` and variables with `X....` things are not needed? And It's very weird it works for me using your data. – Park Nov 23 '21 at 01:16
The variables I need are ```Localidad``` and all ```X...``` I need to be group by ```Localidad``` but the average of each row in ```Localidad``` – Juan Jesus Nov 23 '21 at 01:20
@JuanJesus With your data, it give me that result I add above. Does it fit your purpose? – Park Nov 23 '21 at 01:24
Yes!!! That's what I was looking for, Thank you so much!!! – Juan Jesus Nov 23 '21 at 01:27
@JuanJesus Does this code works for your case? I'm worried if it doesn't works in your session.... – Park Nov 23 '21 at 01:30
I was trying to replicate the same code as your but it gives me this error: ```Error in group_by(., Localidad) : object 'dummy' not found``` Im not very familiar with R code – Juan Jesus Nov 23 '21 at 01:33
@JuanJesus Oh....That's my data name. Try using your data name. Just simply change `dummy` to your data object. – Park Nov 23 '21 at 01:33
Ok you right my bad, it gives me this another error ```Error: Problem with `summarise()` input `..1`. i `..1 = across(where(is.numeric), ~scales::percent(mean(.x)))`. x there is no package called ‘scales’ i The error occurred in group 1: Localidad = " 13069TIZAYUCA".``` – Juan Jesus Nov 23 '21 at 01:37
@JuanJesus You do not have `library(scales)`. Try `install.packages("scales")` first. – Park Nov 23 '21 at 01:38
Done! Thank you so Much @Park one last question, How can I view it as a Table? Thanks for all your help. – Juan Jesus Nov 23 '21 at 01:41
@JuanJesus I add `result <- ` at the beginning of the code. Then the result will be saved in object `result` :D. – Park Nov 23 '21 at 01:43

score 0 · Answer 2 · answered Nov 23 '21 at 01:41

We can use rowwise and calculate the mean with the columns that match the pattern starting with and X followed by a number.

library(tidyverse)


# Data --------------------------------------------------------------------


df <- 
read_table("
                                                           Nombre               X2.1 X2.2 X2.3 X2.4 X2.5 X2.6 X2.7 X2.8 X2.9
                                              MUNDO-DE-LOS-PEQUES    1    1    1    1    1    1    1    1    1
                                   GUARDERIA-EL-ARBOL-DE-LA-NIÑEZ    1    1    1    1    1    1    1    1    1
                                           LOS-PEQUEÑOS-GENIOS-II    1    1    1    1    1    0    1    1    1
                                                   MI-MUNDO-FELIZ    1    1    1    1    1    1    1    1    1
                       CENTRO-ECOLÓGICO-DE-DESARROLLO-INFANTIL-II    1    1    1    1    1    1    1    1    1
                                 ESTANCIA-INFANTIL-TERCER-MILENIO    1    1    1    1    1    0    1    1    1
                                                   CEDEI-COKELUSH    1    1    1    1    1    0    1    1    1
                    INSTITUTO-PEDAGÓGICO-DE-EDUCACIÓN-INCIAL-S.A.    1    1    1    1    1    1    1    1    1
               CTO.-DE-DESARROLLO-INFANTIL-GARABATILLOS,-SA-DE-CV    1    1    1    1    1    1    1    1    1
 CENTRO-DE-DESARROLLO-INFANTIL-EL-MUNDO-DE-LOS-NIÑOS-Y-AS-NIÑAS    1    1    1    1    1    1    1    1    1
                                              CARITAS-SONRIENTES    1    1    1    1    1    0    1    1    1")

df <- 
df %>% 
  mutate(Nombre = str_replace_all(Nombre, '-', ' '))


# Code --------------------------------------------------------------------


df %>% 
  rowwise() %>% 
  mutate(Pecentage = mean(c_across(matches('^X\\d')))) 
#> # A tibble: 11 × 11
#> # Rowwise: 
#>    Nombre         X2.1  X2.2  X2.3  X2.4  X2.5  X2.6  X2.7  X2.8  X2.9 Pecentage
#>    <chr>         <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>     <dbl>
#>  1 MUNDO DE LOS…     1     1     1     1     1     1     1     1     1     1    
#>  2 GUARDERIA EL…     1     1     1     1     1     1     1     1     1     1    
#>  3 LOS PEQUEÑOS…     1     1     1     1     1     0     1     1     1     0.889
#>  4 MI MUNDO FEL…     1     1     1     1     1     1     1     1     1     1    
#>  5 CENTRO ECOLÓ…     1     1     1     1     1     1     1     1     1     1    
#>  6 ESTANCIA INF…     1     1     1     1     1     0     1     1     1     0.889
#>  7 CEDEI COKELU…     1     1     1     1     1     0     1     1     1     0.889
#>  8 INSTITUTO PE…     1     1     1     1     1     1     1     1     1     1    
#>  9 CTO. DE DESA…     1     1     1     1     1     1     1     1     1     1    
#> 10 CENTRO DE DE…     1     1     1     1     1     1     1     1     1     1    
#> 11 CARITAS SONR…     1     1     1     1     1     0     1     1     1     0.889

^{Created on 2021-11-22 by the reprex package (v2.0.1)}

Thank you so much for answering! That helps a lot – Juan Jesus Nov 23 '21 at 01:48 — Juan Jesus, Nov 23 '21 at 01:48

How can I summarise rows and mean for a specific column

2 Answers2

Add