-1

I am newbie to R and hence struggling to figure out on how to retrive the complete list of fields / columns from the data frame with in aggregate function.

For example, I have a data frame df with 200+ fields. Now, I would like to group the data frame on a particular field df.a and then order by another field df.b. However, in the output data frame, I want each of the rows containing all the 200+ fields instead of only df.a and df.b fields.

Please help me understand how to achieve this.

David Arenburg
  • 91,361
  • 17
  • 137
  • 196
  • 7
    Welcome to StackOverflow! Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269). This will make it much easier for others to help you. – David Arenburg Nov 02 '15 at 12:16
  • [R Grouping functions: sapply vs. lapply vs. apply. vs. tapply vs. by vs. aggregate](http://stackoverflow.com/questions/3505701), [Aggregate multiple variables simultaneously](http://stackoverflow.com/questions/9723208), [How to sum a variable by group?](http://stackoverflow.com/q/1660124) – zx8754 Nov 02 '15 at 12:23

1 Answers1

1

You could write your own function but it would probably be best if you use an already written, debugged and performance-tuned function. The package dplyr is excellent for these sort of things.

I find myself often writing lines like:

df %>% group_by(a) %>% arrange(b) %>% summarise(total = sum(b))

Where df is my dataframe, group_by is the function to group your rows by a specific column (or set of columns) and arrange is the function to reorder your rows by a column (or set of columns). summarise is a way to perform aggregations and produce summaries of data. %>% is the 'pipe' operator used to feed the result of the expression on the left as the first argument to the function on the right instead of writing multiple nested calls that are hard to read or creating intermediate one-off variables.

Hope this helps or gives you some better ideas.

kliron
  • 4,383
  • 4
  • 31
  • 47
  • Thanks Kliron for the answer. This is what I was looking for. However, just wondering how I can perform count on a particular column using dplyr. – Vijay Bhoomireddy Nov 02 '15 at 13:28
  • Look at the edited post. You can use summarise to produce aggregates/summaries of your data. If you want a specific answer to a specific problem please give a simple example data frame and a desired output in your question above. – kliron Nov 02 '15 at 13:42