0

I'm facing a peculiar issue in R. I need to avoid the autogeneration of column name after using the count() and aggregate() functions. If the code is run, on a machine which has some other base language, R assigns a column name which is different and thus my code throws an error.

What I wrote:

aggregated_data <- aggregate(data$diff, by = list(P_key = data$P_key), FUN=sum)
names(aggregated_data)[names(aggregated_data) == "x"] <- "d_sum"

count_data <- count(data, P_key)
names(count_data)[names(count_data) == "n"] <- "Obs"

The variable "x" and "n" are auto generated and change if the language of the computer is different. I need a way through which I can directly assign "d_sum" and "Obs" respectively, and avoid this issue.

The data (before renaming - for the aggregate function):

  P_key             x
1 115.770.11.5  21065
2 115.882.KJ.1 223451
3 115.883.KJ.1  47847
4 616.222.11.1 337464

The data (after explicitly renaming the 'x' column - for the aggregate function):

     P_key          d_sum
1 115.770.11.5      21065
2 115.882.KJ.1     223451
3 115.883.KJ.1      47847
4 616.222.11.1     337464

I need to avoid the auto generated column name "x", and assign it as "d_sum". Hence, avoiding the explicit renaming in the second line of the code.

Any help would be immensely appreciated.

Syed Ahmed
  • 199
  • 1
  • 10
  • Could you provide a reproducible example? Provide some of your datas (using `dput` for instance) and your expected output – Rhesous Aug 05 '20 at 15:40

2 Answers2

2

Regarding aggregate:

# your approach:
aggregate(iris$Sepal.Length, by=list(Species=iris$Species), FUN=sum)
#>      Species     x
#> 1     setosa 250.3
#> 2 versicolor 296.8
#> 3  virginica 329.4
  
# instead, do this:
aggregate(Sepal.Length ~ Species, data = iris, sum)
#>      Species Sepal.Length
#> 1     setosa        250.3
#> 2 versicolor        296.8
#> 3  virginica        329.4

Created on 2020-08-05 by the reprex package (v0.3.0)

dplyr::count has a name argument that you could use.

user12728748
  • 8,106
  • 2
  • 9
  • 14
  • 1
    Also possible without using the formula approach if OP passes a 1-column data frame as the first argument, e.g., `aggregate(iris["Sepal.Length"], by=list(Species=iris$Species), FUN=sum)`. Though all of these go back to the original column name. If OP's desired `d_sum` name is new, then there is not a way to do that within `aggregate`... perhaps use `dplyr` or `data.table` instead. – Gregor Thomas Aug 05 '20 at 15:54
  • @Gregor I agree - before the edit by the OP it sounded like `x` was an auto-generated name, while it was in fact the name of the column. Renaming that column before or after aggregating is therefore necessary. – user12728748 Aug 05 '20 at 15:58
1

You might try the following: setNames(aggregated_data, c(names(aggregated_data)[1], "d_sum"))

The same for Obs.

SweetSpot
  • 101
  • 2