0

I have a simple question. The aggregate() function in R operates on a dataframe based on the conditions specified.

aggregate(my.data.frame, list(desired column), function to be applied) is the default usage.

It is useful to compute simple functions like mean and median of a dataframe's column specific values. What I have, though, is a function which doesn't operate on dataframes, but I need to aggregate my dataframe after performing this function on a specific column. Let me show the dataset:

GPS Dataset

So I need to compute the centroid for the longitude and latitude points for EACH BSSID, I need to aggregate it that way. The functions I found online from various packages compute the centroid for a matrix of values and not a dataframe, whereas aggregate() doesn't work on non-dataframes.

Many thanks in advance :)

  • 3
    Please don't use images to show your data. Post the data (or portions of it) instead. http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – RHertel Feb 07 '16 at 10:41
  • 2
    By the way, I'm quite confident that your problem could be treated with `aggregate()`, but without any tangible information on your function and your data, an answer to your question requires a fair amount of guesswork. – RHertel Feb 07 '16 at 10:49
  • @RHertel, I shall be posting better questions henceforth. Kinda new to the environment :) – Abhinay Reddy Feb 07 '16 at 16:52
  • I got it via aggregate, by converting the spatial coordinates into cartesian. Thank you :) – Abhinay Reddy Feb 08 '16 at 07:34

2 Answers2

0

I like dplyr for this - the syntax looks nice to me.

my.data.frame %>% 
    group_by(bssid) %>% 
    summarise(centroidlon = myfunction(lon, lat)[1], 
              centroidlat = myfunction(lon, lat)[2])

If myfunction is fast, then this will work, but if it is slow, you probably want to rework it so that you only call the function once per bssid.

Edit to show alternative method without %>% operator

grouped.data.frame = group_by(my.data.frame, bssid)
summarised.data.frame = summarise(grouped.data.frame,
                                  centroidlon = myfunction(lon, lat)[1],
                                  centroidlat = myfunction(lon, lat)[2])

The %>% operator takes the left hand side, and passes it as the first argument to the right hand side. It's useful for chaining your statements together without getting confused by hundreds of nested brackets. It makes things easier to read, in my opinion.

CPhil
  • 917
  • 5
  • 11
  • Shall look into this thank you :) Can you please explain to me the %>% part, google tells that it is part of a special package, but things aren't working when I installed it. – Abhinay Reddy Feb 08 '16 at 06:37
  • It's called the "pipe" operator. It comes from the `magrittr` package but is bundled with the `dplyr` package, so you don't need to load magrittr separately. You can read it as "and then" - it helps you to specify your operations as one chunk rather than having multiple steps. I'll make an edit in the answer above to show an alternative way of doing things without this. – CPhil Feb 08 '16 at 18:17
0

Aggregate works fine on matrices (and not just data frames). Here's a reproducible example of your problem, using a matrix instead of a data frame:

my_matrix <- matrix(c(100,100,200,200,11,22,33,44,-1,-2,3,-4),
                nrow=4,ncol=3,
                dimnames=list(c(1,2,3,4),c('BSSID','lat','long')))

> my_matrix

   BSSID lat long
1   100  11   -1
2   100  22   -2
3   200  33   -3
4   200  44   -4

> aggregate(cbind(lat,long)~BSSID,my_matrix,mean)

   BSSID  lat long
1   100  16.5 -1.5
2   200  38.5 -3.5

So that would be the mean (or the centroid) of the latitudes and longitudes for each BSSID. The cbind function (column-bind) lets you select multiple variables to be aggregated, similar to an Excel Pivot Table.

If still in doubt, you can always convert matrices to data-frames by using the as.data.frame() function and revert back to matrices using as.matrix() if needed.

Adarsh Chavakula
  • 1,509
  • 19
  • 28
  • It is working perfect with a matrix as you have demonstrated, but the centroid computing function is still proving difficult, mean() doesn't seem to do justice to the spatial coordinates. Thanks a ton, though, shall post my questions in a better format the next time. – Abhinay Reddy Feb 07 '16 at 18:47
  • I have converted the spatial coordinates into cartesian, and then took the mean. I am guessing that should work. – Abhinay Reddy Feb 08 '16 at 07:33