0

I have a very wide dataframe: >80 columns. I would like to aggregate over some of the columns on the left, applying paste0 over the other columns:

prov_solicitud        expediente Puntos AR16_09 BA16_09 BA11_08 BA17_09 BA22_08
          Vigo   BS607A 2014/1-5     65    <NA>    <NA>    <NA>    <NA>    <NA>
      A Coruña  BS607A 2014/10-1     42    <NA>       1    <NA>    <NA>    <NA>
          Lugo  BS607A 2014/10-2     10    <NA>    <NA>       -    <NA>       O
          Lugo  BS607A 2014/10-2     10    <NA>       2    <NA>    <NA>    <NA>
          Vigo  BS607A 2014/10-5     34    <NA>       E    <NA>    <NA>    <NA>
          Lugo BS607A 2014/100-2     29    <NA>    <NA>    <NA>    <NA>    <NA>

dim(tbl)
> [1] 491  81



Having less columns, I would do it with dplyr: (in this example there are only 5 data columns to paste)

tbl %.% group_by(prov_solicitud, expediente, Puntos) %.%
  summarise(AR16_09=paste0(AR16_09), BA16_09=paste0(BA16_09),
            BA11_08=paste0(BA11_08), BA17_09=paste0(BA17_09),
            BA22_08=paste0(BA22_08))

How could I do it without having to type all the column names? Maybe using by or aggregate and a formula like prov_solicitud + expediente + Puntos ~ .. Would it be useful to use as.formula. Is there a simpler way?

Probably it would be neccesary to convert all NA to "" in the data columns. And I would like to maintain the same column names.

crestor
  • 1,388
  • 8
  • 21

1 Answers1

1

By paste0 did you mean to collapse the values into a single string? Its hard to know since there was no sample output in the question. If that is what you want:

# use a different value for collapse if you want a separator
collapse <- function(x) paste(na.omit(x), collapse = "")
tbl %>% 
    group_by(prov_solicitud, expediente, Puntos) %>% 
    summarise_each("collapse")

Alternately collapse could be written like this:

collapse <- function(x) na.omit(x) %>% paste(collapse = "")

or maybe what you want is something like:

collapse <- function(x) na.omit(x) %>% toString()
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • Thanks a lot. Using `summarise_each` is the best solution here. My question could be linked to http://stackoverflow.com/questions/21295936/can-dplyr-summarise-over-several-variables-without-listing-each-one I see that `summarise_each` is a quite recent function on dplyr package. – crestor Jul 07 '14 at 10:27