In broad terms, I try to use apply() so that processing one row depends on the results of previously processed rows.
This post is related, but didn't help me build the results.
I want to build a dataframe of unique "locations
" from a dataframe of incidents
. The incidents are registered with geocoordinates (lon,lat).
I've sorted the incidents by lon
and lat
, then go through them sequentially with apply()
. As a result, I want to get something like expectedResult.
I check if the geocoordinates of an incident are equal to the geocoordinates of one I've processed previously. If they aren't, I create a new location. If they are, I assume the incident took place at the same location.
My issue is that I don't know how to build the dataframe/list of locations when applying the function to incidents. Before applying the function checkEquals
to incidents, I create an initial dataframe locations
containing the first location.
In my sample data, row 3 is intentionally a duplicate of 1, so that at least these incidents should be added to the same location.
checkEquals <- function(row,loc){
prevLoc <- loc[nrow(loc),]
if (as.numeric(row["lon"]) == as.numeric(prevLoc["lon"])
&& as.numeric(row["lat"]) == as.numeric(prevLoc["lat"])) {
# if (row == prevLoc) {
prevLoc["count"] <- as.numeric(prevLoc["count"]) + 1
loc[nrow(loc),] <- prevLoc
} else {
loc[nrow(loc)+1,] <- c(row["id"], row["lon"], row["lat"],count=1)
}
locations <<- loc
}
main <- function(){
incidents <- data.frame(id = c(1,2,3,4), lon = c(-81, -80, -81, -79), lat = c(42, 40, 42, 41) )
incidents <- incidents[order(incidents$lon, incidents$lat),]
locations <- data.frame(id=1,lon=incidents[1,]$lon, lat=incidents[1,]$lat, count=0)
locations <- apply(incidents,1,checkEquals,locations)
print(locations)
expectedResult <- data.frame(id = c(1,2,4), lon = c(-81, -80, -79), lat = c(42, 40, 41), count = c(2,1,1))
print(expectedResult)
}
> main()
$`1`
id lon lat count
1 1 -81 42 1
$`3`
id lon lat count
1 1 -81 42 1
$`2`
id lon lat count
1 1 -81 42 0
2 2 -80 40 1
$`4`
id lon lat count
1 1 -81 42 0
2 4 -79 41 1
> expectedResult
id lon lat count
1 1 -81 42 2
2 2 -80 40 1
3 4 -79 41 1
In each iteration of apply()
, the program compares against the initial locations
. I want locations
to change with every iteration, adding rows or modifying existing ones. Apparently the final assignment locations <<- loc
doesn't do the trick, nor explicit assign()
.
In addition, there are still the formatting issues of locations, which is a list of dataframes rather than a dataframe.