4

I need to create a function that has an optional parameter, depending on if the frequencies are given or have to be calculated: Example tables:

If they are given:

tbl <- data.frame(
    campoALimp = c('uno', "uno1", "Maria", "Mariana", "María", "Mara"),
    freqAbs = c(2, 5, 2, 6, 7, 6))

If not:

tbl1 <- data.frame(campoALimp = tbl[rep(1:nrow(tbl), tbl[ , 2]), 1])

My function (part of it) is:

limpio <- function (tabla, campo, campo_conteo){
    tabla <- tabla[nchar(as.character(tabla[, campo])) > 2, ]

    if(missing(campo_conteo))
        { print("calcula freq")
        #detach("package:plyr", unload=TRUE) 
        require(dplyr)
        tabla1<-data.frame(tabla %>% 
            group_by_(campo) %>% summarise(frecuencia = n() )) 
    } else {tabla1 <- tabla
    tabla1$frecuencia <- tabla1[, campo_conteo]}
return(tabla1)
}

First, I have problems with detach (in this case is commented but if I use it it shows error:

Error in detach("package:plyr", unload = TRUE) : invalid 'name' argument

If I run the code for the table with frequencies, I have no problem (it's only copying the original table).

limpio(tbl1, 'campoALimp')

But If I run it for the 2nd table: I got the following error:

limpio(tbl, 'campoALimp', 'freqAbs')
Error in UseMethod("group_by_") : 
  no applicable method for 'group_by_' applied to an object of class "factor"

I tried writing the detach plyr outside the function and the run the function, and I got the same error.

I tried doing the same outside the function:

tabla <- tbl1
campo <- 'campoALimp'
tabla1 <- NULL
tabla1 <- data.frame(tabla %>% 
            group_by_(campo) %>% summarise(frecuencia = n() )) 

And I get the correct result

  campoALimp frequency
       Mara          6
      Maria          2
      María          7
    Mariana          6
        uno          2
       uno1          5

Why this is not working inside the function? Thanks.

alistaire
  • 42,459
  • 4
  • 77
  • 117
GabyLP
  • 3,649
  • 7
  • 45
  • 66

1 Answers1

0

I solved two issues in your code. Firstly, group_by_ is deprecated (see ?group_by_) and should not be used anymore. But there is the possibility to use group_by in a function (see answer by teppo in dplyr: How to use group_by inside a function?). The second issue was, that tbl1 was transformed to a vector, because it only had one column. This vector does not have a column name campoALimp anymore. Thus, I changed the selection to tidyverse-style. This makes the data.frame transformation below unneccessary.

tbl <- data.frame(
  campoALimp = c('uno', "uno1", "Maria", "Mariana", "María", "Mara"),
  freqAbs = c(2, 5, 2, 6, 7, 6))

tbl1 <- data.frame(campoALimp = tbl[rep(1:nrow(tbl), tbl[ , 2]), 1])

limpio <- function (tabla, campo, campo_conteo){
  require(dplyr)
  ## maintain tabla as a data.frame
  tabla <- tabla %>%  
    filter(nchar(as.character(tabla[, campo])) > 2)

  if(missing(campo_conteo)){
   print("calcula freq")
    #detach("package:plyr", unload=TRUE) 

   ## delete data.frame() because tabla already is a data.frame
   tabla1<-tabla %>%
   ## use group_by with the answer of teppo
   group_by(.data[[campo]] ) %>% summarise(frecuencia = n() )
  } else {
    tabla1 <- tabla
    tabla1$frecuencia <- tabla1[, campo_conteo]}
    return(tabla1)
 }

 limpio(tbl1, "campoALimp")
 limpio(tbl, 'campoALimp', 'freqAbs')