Why is my list extraction to a data frame resulting in fewer values than the list contains?

Question

I have a list of around 200,000 elements.

Each element stores two values and represents map coordinates (latitude and longitude).

I want to extract the values into lat and lon variables and so far have come up with this:

for(i in nrow(users)) {
  lat[i] <- users$location[[i]][1]
  lon[i] <- users$location[[i]][2]
}

coords <- as.data.frame(cbind(lat, lon))

As far as I can see, it appears to have extracted the first element and then 19 elements at the end with nothing between (20 in total when checking with complete.cases).

Ideally, I would like to exclude NA and 0, 0 values also.

Looking at the list directly, I can see that this is wrong as there are several values contained within it.

If I compare the final data frame to the list items, the figures don't match. For example, the value -73.9924 exists in the list but not in my data frame.

Where am I going wrong?

My final data frame:

> coords[complete.cases(coords), ]
            lat       lon
1       37.4590 -122.1781
96960   40.8152  -73.3624
96961   40.0409  -75.6374
96962   42.5153  -70.9075
96963   33.7773  -84.3366
96964   39.9831  -86.2876
96965   40.7588  -73.9680
96966   36.7646  -76.1990
96967   44.7415  -91.3012
96968   42.6179  -70.7154
96969   40.5953  -74.6173
96970   50.8000   -0.3667
96971   34.0523 -118.3852
96972   41.4468  -74.0689
96973   26.9467  -80.2170
96974   40.7139  -74.0079
96975   34.2313 -118.1486
96976   43.6655  -79.4378
96977   39.0972  -84.1225
96978 -122.1781   37.4590

Sample of list contents:

[[734]]
[1] 0 0

[[735]]
[1] 0 0

[[736]]
[1] 0 0

[[737]]
[1] 0 0

[[738]]
[1] -73.9924  40.7553

[[739]]
[1] 0 0

[[740]]
[1] -76.7818  39.4370

[[741]]
[1] -97.822  37.751

[[742]]
NULL

[[743]]
[1] 0 0

[[744]]
[1] 0 0

Please reduce your problem to a minimal reproducible problem. And you might even find your solution on your own — Emmanuel-Lin, Jun 29 '18 at 12:16

iod · Accepted Answer · 2018-06-29T13:35:41.440

1

No need for for loop. Use sapply with [ as the function:

lat<-sapply(users$location,"[",1)
lon<-sapply(users$location,"[",2)

Not sure what's the cause of the skipping of lines, but if this still doesn't work, we can work through the root cause from there.

If you want to avoid NULLs use this with the two vectors you created:

lat<-unlist(lat[!sapply(lat,is.null)])

and similarly for lon. Alternatively, you can apply the same logic to users$location before creating lat and lon - may be faster with long lists.

If you want everything in one (somewhat) elegant command, I would suggest going through the intervening process of turning the list into a matrix using sapply, and then changing it into a data.frame:

coords<-as.data.frame(t(sapply(users$location[!sapply(users$location,is.null)],"[",c(1,2)))) %>% 
dplyr::rename(lat=V1,lon=V2) %>% 
dplyr::filter(!lat==0,!lon==0)

edited Jun 29 '18 at 13:35

answered Jun 29 '18 at 12:41

iod

7,412
2
17
36

I'm not sure what happened, but I tried your approach and it seems to have worked perfectly. It also includes `NULL` and `0` values (which is what I expected) - is there a way to remove these upon generation? – Mus Jun 29 '18 at 12:45
Glad it worked for you. Hmm... Gawd I hate working with lists... See edited answer above. – iod Jun 29 '18 at 12:54
This is amazing, thank you! Is there a way to remove `0`-values, also? – Mus Jun 29 '18 at 13:00
These would extract the zeroes only, right? Also, I changed this slightly as it was missing the `row`/`column` separator: `coords[coords$lat == 0 & coords$lon == 0, ]`. Is there a way to remove the zeroes on the fly? The actual zero values are `0.0000`. – Mus Jun 29 '18 at 13:08
right, sorry, forgot the `!`s. before the conditions. Should be: `coords<-coords[!coords$lat==0 & !coords$lon==0,]` or `dplyr::filter(coords, !lat==0, !lon==0)`. – iod Jun 29 '18 at 13:11
1

Ah yes, perfect thank you! Chosen as the solution and duly upvoted for your efforts! I suggest you add the extra code to the answer body also for completeness. – Mus Jun 29 '18 at 13:12
1

Thanks! also, just for fun, see my new edit with a single command (using piping) version that removes nulls and double-0's. – iod Jun 29 '18 at 13:29

score 0 · Answer 2 · answered Jun 29 '18 at 12:15

Suppose you have a list like in my example, you can use dplyr, like this:

require(dplyr)
lista <- list(as.data.frame(matrix(c(0,0), nrow = 1)), 
          as.data.frame(matrix(c(37.4590,-122.1781), nrow = 1)), 
          as.data.frame(matrix(c(NA,NA), nrow = 1)), 
          as.data.frame(matrix(c(42.5153,-70.9075), nrow = 1))) # toy example
names(lista) <- 1:4 # each element in the list has a name

lista %>% 
  bind_rows() %>% 
  filter(!is.na(V1), !is.na(V2)) %>%  # here you remove NAs
  filter(V1 != 0, V2 != 0) # here you remove zeros

Why is my list extraction to a data frame resulting in fewer values than the list contains?

2 Answers2