2

For a little more Background information: I have a set of coordinates in Lat/Lon and wish to add the respective UTM coordinates to the data frame or SpatialPointsDataFrame. To this end, I have so far written a function that does that by first converting the df to a SpatialPointsDataFrame, reprojects to UTM and writes the coordinates to the input DF.

WGS2UTM <- function(df, WGS_coords){
    temp <- sp::SpatialPointsDataFrame(coords = WGS_coords, data = df, 
                                       proj4string = CRS("+proj=longlat 
                                       +ellps=WGS84 +datum=WGS84 +no_defs"))
    temp <- spTransform(temp, CRS(as.character(unique(temp@data$EPSG_UTM))))
    df$UTM_E <- sp::coordinates(temp)[,"x"]
    df$UTM_N <- sp::coordinates(temp)[,"y"]
    return(df)
}

The EPSG code used to reproject in the function is contained in the DF as a Factor.

Now to my question: Since we frequently deal with locations spread across multiple different UTM Zones, I'd like to be able to apply the function above to the factor levels of the EPSG_UTM column. I am aware that the apply family is best used for this kind of operation but I can't figure it out. Any pointers?

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
  • Does https://stackoverflow.com/questions/3505701/grouping-functions-tapply-by-aggregate-and-the-apply-family?rq=1 help? Or `?by`? –  May 25 '18 at 10:06
  • I've searched the pages quite extensively and find that frequently the functions that are being used are rather simple, mean, sum and so forth. I just dont get how I can use multiple arguments. – JamesPatrick May 25 '18 at 10:14
  • `by(data, data$factor, WGS2UTM, WGS_coords = your_extra_argument)`. If the extra argument needs to vary, and can't be passed in as part of the data frame itself, then consider `mapply`. –  May 25 '18 at 10:15
  • I have tried to use by() before posting here, the error i get is: Error in validObject(.Object) : invalid class “SpatialPointsDataFrame” object: number of rows in data.frame and SpatialPoints don't match – JamesPatrick May 25 '18 at 11:37
  • That looks like an error in your function, not an error in `by`. (But I may be wrong - check with `options(error=recover)`.) –  May 25 '18 at 11:51

2 Answers2

0

Well it seems I found an alternative, though it involves a for loop, a couple additional lines, and splitting the data into a list of dataframes.`

UTM = NULL

df_list <- split(data, data$EPSG_UTM)
  for (i in 1:length(df_list)){
    t <- WGS2UTM(df_list[[i]],data.frame(df_list[[i]])[,c("x","y")])
    UTM=rbind(UTM,t)
  }
data.cbind <- cbind(data,UTM)
  • Reconsider this approach as you are expanding an object in a loop. And `by()` can be used anytime `split()` is used. – Parfait Jun 03 '18 at 20:18
0

Reconsider the use of expanding a data frame in a loop that leads to excessive copying in memory. Since the split() solution worked, consider building a list of data frames using by() (roughly equivalent of split + lapply), then rbind all data frames in one call.

df_list <- by(data, data$EPSG_UTM, function(sub) WGS2UTM(sub, sub[,c("x","y")]))

coords_df <- do.call(rbind, df_list)

data.cbind <- cbind(data, coords_df)
Parfait
  • 104,375
  • 17
  • 94
  • 125