R Subsetting - Plotting Unequal Lists

Question

I have three lists – lat, long, wifiRssi. Each list has the same number of rows. lat and long will always have the same number of elements per row. wifiRssi will usually have less elements than lat/long but sometimes more. I am trying to plot these values but since the elements of my lists are unequal to I receive a bounds exception.

Sample Data:

location_lat
[32.831, 32.831, 32.832, 32.832, 32.833, 32.833, 32.834, 32.834, 
 32.835, 32.835, 32.836, 32.836, 32.837, 32.837, 32.838]



location_long
[-96.691, -96.691, -96.692, -96.692, -96.693, -96.693, -96.694,  -96.694, 
 -96.695, -96.695, -96.696, -96.696, -96.697, -96.697, -96.698]



wifi_Rssi
[-81, -81, -81, -81, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 0]

Code Snippet:

I strip off the brackets then . . .

wifiRssi <- opr$wifi_Rssi
wifiRssi <- gsub(" ", "", wifiRssi, fixed = TRUE)
wifiRssi <- strsplit(wifiRssi, ",")
wifiRssi <- unlist(wifiRssi)
wifiRssi <- as.integer(wifiRssi)

lat<- as.character(opr$location_lat)
lat<- gsub(" ", "", lat, fixed = TRUE)
lat<- strsplit(lat, ",")
lat<- unlist(lat)
lat<- as.double(lat)

long<- as.character(opr$location_long)
long<- gsub(" ", "", long, fixed = TRUE)
long<- strsplit(long, ",")
long<- unlist(long)
long<- as.double(long)

pal <- colorNumeric(c('red','green'), wifiSNR)

geoplots <- sp::SpatialPointsDataFrame(
  cbind(long, lat),
  data.frame(wifiRssi)
)

Error in validObject(.Object) : invalid class “SpatialPointsDataFrame” object: number of rows in data.frame and SpatialPoints don't match

What I want to be able to do is truncate the list to the smallest number of elements. For example, if wifiRSSI contained n elements and lat/long contained n+5 elements – then truncate lat/lon to the first n elements [1:n] to match wifiRSSI then plot.

Any ideas or suggestions would be appreciated.

Typo - should read as: geoplots <- sp::SpatialPointsDataFrame( cbind(long, lat), data.frame(wifiRssi) — JohnA, Dec 05 '15 at 16:29

score 2 · Answer 1 · edited May 23 '17 at 12:15

After extracting long, lat, and wifiRssi from opr, you can find the length of the shortest vector using min and length. Then, you can use head to shorten each one to this length prior to further processing.

minlength<-min(length(long),length(lat),length(wifiRssi))
long<-head(long, minlength)
lat<-head(lat,minlength)
wifiRssi<-head(wifiRssi,minlength)

While head may be more readable, if you are doing this operation many times with large vectors, you may want to use other approaches. Following @Joris Meys' analysis:

                                            test replications elapsed relative
1                         expression(head(x, n))      1000000  22.749    3.315
3                             expression(x[1:n])      1000000   6.863    1.000
2 expression(x[seq.int(to = n, length.out = n)])      1000000  12.612    1.838

So, lat[1:min.length], etc. would be faster than head(lat,min.length). Benchmarking code:

require(rbenchmark)
x <- 1:1e6
n <- 500
do.call(
  benchmark,
  c(list(
    expression(head(x,n)),
    expression(x[seq.int(to=n, length.out=n)]),
    expression(x[1:n])
  ),  replications=1e6)
)

After examining this a bit closer - this will drop chunks of tail-end data. By taking the length of the lists after they have been "unlisted" is taking the size of the entire list is it not. What I am looking for is to take the size of, essentially, "row-by-row" to avoid huge blocks of data from being dropped. — JohnA, Dec 07 '15 at 14:12
@atiretoo's approach to dealing with missing data at various places in the vectors is elegant. edited answer to show you can shorten vectors prior to other processing. also included speed comparison of various ways to take first part of vector — DrPositron, Dec 08 '15 at 02:26

score 1 · Accepted Answer · answered Dec 07 '15 at 22:33

A bit more of a complete version of the answer by DrPositron.

lat <- c(32.831, 32.831, 32.832, 32.832, 32.833, 32.833, 32.834, 32.834, 32.835, 32.835, 32.836, 32.836, 32.837, 32.837, 32.838)

long <- c(-96.691, -96.691, -96.692, -96.692, -96.693, -96.693, -96.694, -96.694, -96.695, -96.695, -96.696, -96.696, -96.697, -96.697, -96.698)

wifiRssi <- c(-81, -81, -81, -81, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 0)

shortest <- min(length(lat),length(long),length(wifiRssi))
geoplots <- sp::SpatialPointsDataFrame(
  cbind(long[1:shortest], lat[1:shortest]),
  data.frame(wifiRssi[1:shortest])
)

You are concerned in the comment that this will drop data from the tail of either locations or wifiRssi. Yes it will. But if you are missing data from either wifiRssi (less values than locations) or locations (more values in wifiRssi than locations), then with your data structure this is the only thing you can do. I think it is more likely that some of your locations and/or signal strengths are missing, and by representing the data as independent vectors the information about which locations go with which signal strengths is scrambled. This seems more likely to me:

df <- data.frame(lat=NA,long=NA,wifiRssi)
df[-ii,"lat"] <-  lat
df[-ii,"long"] <- long

cc <- complete.cases(df)
geoplots <- sp::SpatialPointsDataFrame(
  df[cc,1:2],
  as.data.frame(wifiRssi=df[cc,3])
  )

Here the missing coordinates are randomly scattered throughout the original data, not all at the end. But if you just have 3 independent vectors of different lengths, you have to make some assumptions about what's missing.

All of this information was very helpful and I appreciate the feedback. In the case of my data, lat/long will always be of equal length. The rssi list will usually be longer but sometimes shorter. As a result I ended up using mapply to truncate lat/long to the number of sub-elements then I used the technique above to finish the plot (basically truncate the data again after it was flattened). The result was a more accurate rendering/plot of the data. — JohnA, Dec 08 '15 at 15:45

R Subsetting - Plotting Unequal Lists

2 Answers2