Subsetting elements in a list and placing them in a data frame

Question

I have a list ("listanswer") that looks something like this:

> str(listanswer)
List of 100
 $ : chr [1:3] "" "" "\t\t"
 $ : chr [1:5] "" "Dr. Smith" "123 Fake Street" "New York, ZIPCODE 1" ...
 $ : chr [1:5] "" "Dr. Jones" "124 Fake Street" "New York, ZIPCODE 2" ...


> listanswer
[[1]]
[1] ""   ""   "\t\t"

[[2]]
[1] ""                             "Dr. Smith" "123 Fake Street"         "New York"          
[5] "ZIPCODE 1"    

[[3]]
[1] ""                           "Dr. Jones"   "124 Fake Street,"  "New York"        
[5] "ZIPCODE2"

For each element in this list, I noticed the following pattern within the sub-elements:

# first sub-element is always empty
    > listanswer[[2]][[1]]
    [1] ""
# second sub-element is the name
    > listanswer[[2]][[2]]
    [1] "Dr. Smith"
# third sub-element is always the address 
    > listanswer[[2]][[3]]
    [1] "123 Fake Street"
# fourth sub-element is always the city
    > listanswer[[2]][[4]]
    [1] "New York"
# fifth sub-element is always the ZIP
    > listanswer[[2]][[5]]
    [1] "ZIPCODE 1"

I want to create a data frame that contains the information from this list in row format. For example:

  id      name         address     city       ZIP
1  2 Dr. Smith 123 Fake Street New York ZIPCODE 1
2  3 Dr. Jones 124 Fake Street New York ZIPCODE 2

I thought of the following way to do this:

name = sapply(listanswer,function(x) x[2])
address = sapply(listanswer,function(x) x[3])
city = sapply(listanswer,function(x) x[4])
zip = sapply(listanswer,function(x) x[5])

final_data = data.frame(name, address, city, zip)
id = 1:nrow(final_data)

My Question: I just wanted to confirm - Is this the correct way to reference sub-elements in lists?

There's nothing wrong with what you've done at all IMHO. You could collapse it up a bit, but the logic would remain the same - e.g.: `data.frame(sapply(c("name"=2,"address"=3,"city"=4,"zip"=5), \(n) sapply(listanswer, \(x) x[n]) ))` — thelatemail, Jun 28 '22 at 22:27

luke · Accepted Answer · 2022-06-29T16:19:00.410

2

If it works, it's the correct way, although there might be a more efficient or more readable way to do the same thing.

Another way to do this is to create a data frame with your columns, and add rows to it. i. e.

#create an empty data frame
df <- data.frame(matrix(ncol = 4, nrow = 0))
colnames(df) <- c("name", "address", "city", "zip")

#add rows
lapply(listanswer, \(x){df[nrow(df) + 1,] <- x[2:5]})

This is simply another way to solve the same problem. Readability is a personal preference, and there's nothing wrong with your solution either.

edited Jun 29 '22 at 16:19

answered Jun 28 '22 at 22:30

luke

465
1
14

@ luke: thank you so much for your answer! In the code you have provided, where would I specify "listanswer"? Thank you so much! – stats_noob Jun 29 '22 at 16:14
@stats_noob, I've edited; lapply should take in the list as a parameter, not the dataframe – luke Jun 29 '22 at 16:19
@ luke: thank you so much! Can you please take a look at this question if you have time? https://stackoverflow.com/questions/72804551/r-creating-a-column-for-each-element-in-a-list thank you! – stats_noob Jun 29 '22 at 16:41

score 1 · Answer 2 · answered Jun 29 '22 at 00:03

If this is based on your elephant question, for businesses in Vancouver, then this mostly works.

library(rvest)

url<-"Website/british-columbia/"
page <-read_html(url)

#find the div tab of class=one_third
b = page %>% html_nodes("div.one_third") 

listanswer <- b %>% html_text() %>% strsplit("\\n")
#listanswer2 <- b %>% html_text2() %>% strsplit("\\n")
listanswer[[1]]<-NULL #remove first blank record

rows<-lapply(listanswer, function(element){
   vect<-element[-1] #remove first blank field
   cityindex<-as.integer(grep("Vancouver", vect))  #find city field
   #add some error checking and corrections
   if(length(cityindex)==0) {
      cityindex <- length(vect)-1 }
   else if(length(cityindex)>1) {
      cityindex <- cityindex[2] }

   #get the fields of interest
   address <- vect[cityindex-1]
   city<-vect[cityindex]
   phone <- vect[cityindex+1]
   
  if( cityindex < 3) {
      cityindex <- 3
   }  #error check
   #first groups combine into 1 name
   name <- toString(vect[1:(cityindex-2)])
   data.frame(name, address, city, phone)
})

answer<-bind_rows(rows)
#clean up 
answer$phone <- sub("Website", "", answer$phone)
answer

This still needs some clean up to handle the inconsistences but should be 80-90% complete

@ Dave2e : Thank you so much for your answer - yes, it is! What do you think of the approach I was using with "sapply" - does it look reasonable? thank you so much! — stats_noob, Jun 29 '22 at 00:17
Yes, your sapply strategy would work. It is a style preference, I like a single loop and gather everything once vs 4 sapply statements. Either can work. Whichever works and can be maintained is the winner. — Dave2e, Jun 29 '22 at 00:26

Subsetting elements in a list and placing them in a data frame

2 Answers2