0

This seems to be a recurring question, but I have been reading StackOverflow for hours and can't find a solution, so here it is:

I have a list of 28 elements called D2 of which this is the general structure. It is flight data, so the number of observations in each element and the number of variables per element differ.

> str(D2, max.level = 1)
List of 28
 $ flightStatuses: list()
 $ flightStatuses: list()
 $ flightStatuses:'data.frame': 5 obs. of  12 variables:
 $ flightStatuses:'data.frame': 4 obs. of  12 variables:
 $ flightStatuses:'data.frame': 1 obs. of  11 variables:
 $ flightStatuses:'data.frame': 3 obs. of  12 variables:
 $ flightStatuses:'data.frame': 10 obs. of  15 variables:
 $ flightStatuses:'data.frame': 1 obs. of  12 variables:
 $ flightStatuses: list()
 $ flightStatuses:'data.frame': 2 obs. of  11 variables:
etc.

I am trying to get the contents into a data frame and then save it to csv.

Here is the structure of the third element of the list, as an example:

> str(D2[[3]])
'data.frame':   5 obs. of  12 variables:
 $ flightId              : int  891368844 889954328 889955975 891364679 891364678
 $ carrierFsCode         : chr  "4K" "4N" "5T" "6L" ...
 $ flightNumber          : chr  "901" "207" "444" "414" ...
 $ departureAirportFsCode: chr  "ZFM" "YDA" "YVQ" "ZFM" ...
 $ arrivalAirportFsCode  : chr  "YEV" "YEV" "YEV" "YEV" ...
 $ departureDate         :'data.frame': 5 obs. of  2 variables:
  ..$ dateLocal: chr  "2017-05-11T09:00:00.000" "2017-05-11T09:55:00.000" "2017-05-11T12:30:00.000" "2017-05-11T16:00:00.000" ...
  ..$ dateUtc  : chr  "2017-05-11T15:00:00.000Z" "2017-05-11T16:55:00.000Z" "2017-05-11T18:30:00.000Z" "2017-05-11T22:00:00.000Z" ...
 $ arrivalDate           :'data.frame': 5 obs. of  2 variables:
  ..$ dateLocal: chr  "2017-05-11T15:45:00.000" "2017-05-11T12:10:00.000" "2017-05-11T13:28:00.000" "2017-05-11T16:37:00.000" ...
  ..$ dateUtc  : chr  "2017-05-11T21:45:00.000Z" "2017-05-11T18:10:00.000Z" "2017-05-11T19:28:00.000Z" "2017-05-11T22:37:00.000Z" ...
 $ status                : chr  "L" "U" "L" "U" ...
 $ operationalTimes      :'data.frame': 5 obs. of  14 variables:
  ..$ scheduledGateDeparture    :'data.frame':  5 obs. of  2 variables:
  .. ..$ dateLocal: chr  "2017-05-11T09:00:00.000" "2017-05-11T09:55:00.000" "2017-05-11T12:30:00.000" "2017-05-11T16:00:00.000" ...
  .. ..$ dateUtc  : chr  "2017-05-11T15:00:00.000Z" "2017-05-11T16:55:00.000Z" "2017-05-11T18:30:00.000Z" "2017-05-11T22:00:00.000Z" ...
  ..$ estimatedRunwayDeparture  :'data.frame':  5 obs. of  2 variables:
  .. ..$ dateLocal: chr  "2017-05-11T15:24:00.000" "2017-05-11T09:53:00.000" "2017-05-11T12:27:00.000" NA ...
  .. ..$ dateUtc  : chr  "2017-05-11T21:24:00.000Z" "2017-05-11T16:53:00.000Z" "2017-05-11T18:27:00.000Z" NA ...
  ..$ actualRunwayDeparture     :'data.frame':  5 obs. of  2 variables:
  .. ..$ dateLocal: chr  "2017-05-11T15:24:00.000" "2017-05-11T09:53:00.000" "2017-05-11T12:27:00.000" NA ...
  .. ..$ dateUtc  : chr  "2017-05-11T21:24:00.000Z" "2017-05-11T16:53:00.000Z" "2017-05-11T18:27:00.000Z" NA ...
  ..$ estimatedRunwayArrival    :'data.frame':  5 obs. of  2 variables:
  .. ..$ dateLocal: chr  "2017-05-11T15:45:00.000" "2017-05-11T12:11:00.000" "2017-05-11T13:12:00.000" NA ...
  .. ..$ dateUtc  : chr  "2017-05-11T21:45:00.000Z" "2017-05-11T18:11:00.000Z" "2017-05-11T19:12:00.000Z" NA ...
  ..$ actualRunwayArrival       :'data.frame':  5 obs. of  2 variables:
  .. ..$ dateLocal: chr  "2017-05-11T15:45:00.000" NA "2017-05-11T13:12:00.000" NA ...
  .. ..$ dateUtc  : chr  "2017-05-11T21:45:00.000Z" NA "2017-05-11T19:12:00.000Z" NA ...
  ..$ publishedDeparture        :'data.frame':  5 obs. of  2 variables:
  .. ..$ dateLocal: chr  NA "2017-05-11T09:55:00.000" "2017-05-11T12:30:00.000" NA ...
  .. ..$ dateUtc  : chr  NA "2017-05-11T16:55:00.000Z" "2017-05-11T18:30:00.000Z" NA ...
  ..$ publishedArrival          :'data.frame':  5 obs. of  2 variables:
  .. ..$ dateLocal: chr  NA "2017-05-11T12:10:00.000" "2017-05-11T13:28:00.000" NA ...
  .. ..$ dateUtc  : chr  NA "2017-05-11T18:10:00.000Z" "2017-05-11T19:28:00.000Z" NA ...
  ..$ flightPlanPlannedDeparture:'data.frame':  5 obs. of  2 variables:
  .. ..$ dateLocal: chr  NA "2017-05-11T10:05:00.000" "2017-05-11T12:40:00.000" "2017-05-11T16:15:00.000" ...
  .. ..$ dateUtc  : chr  NA "2017-05-11T17:05:00.000Z" "2017-05-11T18:40:00.000Z" "2017-05-11T22:15:00.000Z" ...
  ..$ scheduledGateArrival      :'data.frame':  5 obs. of  2 variables:
  .. ..$ dateLocal: chr  NA "2017-05-11T12:10:00.000" "2017-05-11T13:28:00.000" NA ...
  .. ..$ dateUtc  : chr  NA "2017-05-11T18:10:00.000Z" "2017-05-11T19:28:00.000Z" NA ...
  ..$ flightPlanPlannedArrival  :'data.frame':  5 obs. of  2 variables:
  .. ..$ dateLocal: chr  NA "2017-05-11T12:25:00.000" "2017-05-11T13:24:00.000" "2017-05-11T16:37:00.000" ...
  .. ..$ dateUtc  : chr  NA "2017-05-11T18:25:00.000Z" "2017-05-11T19:24:00.000Z" "2017-05-11T22:37:00.000Z" ...
  ..$ estimatedGateDeparture    :'data.frame':  5 obs. of  2 variables:
  .. ..$ dateLocal: chr  NA NA "2017-05-11T12:20:00.000" NA ...
  .. ..$ dateUtc  : chr  NA NA "2017-05-11T18:20:00.000Z" NA ...
  ..$ actualGateDeparture       :'data.frame':  5 obs. of  2 variables:
  .. ..$ dateLocal: chr  NA NA "2017-05-11T12:20:00.000" NA ...
  .. ..$ dateUtc  : chr  NA NA "2017-05-11T18:20:00.000Z" NA ...
  ..$ estimatedGateArrival      :'data.frame':  5 obs. of  2 variables:
  .. ..$ dateLocal: chr  NA NA "2017-05-11T13:15:00.000" NA ...
  .. ..$ dateUtc  : chr  NA NA "2017-05-11T19:15:00.000Z" NA ...
  ..$ actualGateArrival         :'data.frame':  5 obs. of  2 variables:
  .. ..$ dateLocal: chr  NA NA "2017-05-11T13:15:00.000" NA ...
  .. ..$ dateUtc  : chr  NA NA "2017-05-11T19:15:00.000Z" NA ...
 $ flightDurations       :'data.frame': 5 obs. of  8 variables:
  ..$ airMinutes             : int  21 NA 45 NA NA
  ..$ scheduledBlockMinutes  : int  NA 75 58 NA NA
  ..$ scheduledAirMinutes    : int  NA 80 44 22 26
  ..$ scheduledTaxiOutMinutes: int  NA 10 10 15 15
  ..$ blockMinutes           : int  NA NA 55 NA NA
  ..$ taxiOutMinutes         : int  NA NA 7 NA NA
  ..$ scheduledTaxiInMinutes : int  NA NA 4 NA NA
  ..$ taxiInMinutes          : int  NA NA 3 NA NA
 $ flightEquipment       :'data.frame': 5 obs. of  2 variables:
  ..$ actualEquipmentIataCode   : chr  "BE1" "HS7" "733" "DHT" ...
  ..$ scheduledEquipmentIataCode: chr  NA "HS7" "733" NA ...
 $ schedule              :'data.frame': 5 obs. of  4 variables:
  ..$ flightType    : chr  NA "J" "J" NA ...
  ..$ serviceClasses: chr  NA "RY" "RFJY" NA ...
  ..$ restrictions  : chr  NA "" "" NA ...
  ..$ uplines       :List of 5
  .. ..$ : NULL
  .. ..$ :'data.frame': 1 obs. of  2 variables:
  .. .. ..$ fsCode  : chr "YXY"
  .. .. ..$ flightId: int 889956597
  .. ..$ :'data.frame': 2 obs. of  2 variables:
  .. .. ..$ fsCode  : chr  "YEG" "YZF"
  .. .. ..$ flightId: int  889954472 889957614
  .. ..$ : NULL
  .. ..$ : NULL

As you can see, there are multiple data frames within each data frame that is an element of the list. I have read, amongst others, these posts to try and get all of this into a data frame.

  1. R list to data frame But this one has lists of equal length

  2. https://www.r-bloggers.com/concatenating-a-list-of-data-frames/ But even when I try it on an isolated element of the list, like the sample above, I get errors such as these:

    df<-ldply(D2[[3]], rbind)
    Error in list_to_dataframe(res, attr(.data, "split_labels"), .id, id_as_factor) : 
    Results must be all atomic, or all data frames
    
    > df<-do.call(rbind, D2[[3]])
    Error in rbind(deparse.level, ...) : 
      les nombres de colonnes des arguments ne correspondent pas _Number of columns doesn't correspond|
    
  3. Extracting from Nested list to data frame This one seems promising, but just too complex in the way it is explained. I'm a beginner in R, so I need something with a bit more human language.

  4. Converting nested list (unequal length) to data frame This one is with named vectors and not data frames. When I try solution from @MrFlick, I get this:

    > df <- rbind.fill(lapply(D2, function(x)as.data.frame(t(x))))
    Error in if (inherits(X[[j]], "data.frame") && ncol(xj) > 1L) X[[j]] <- as.matrix(X[[j]]) : 
      valeur manquante là où TRUE / FALSE est requis
    Called from: as.matrix.data.frame(x)
    
  5. Converting nested list (unequal length) to data frame When I try @akrun's answer, I get:

    > indx<-lengths(D2)
    > res<-as.data.frame(do.call(rbind, lapply(D2, `length<-`,max(indx))))
    Error in rbind(deparse.level, ...) : 
      liste d'arguments incorrecte : toutes les variables doivent avoir la même longueur
    > colnames(res)<-names(D2[[which.max(indx)]])
    
  6. List elements to dataframes in R When I try answer by @David Arenburg:

    > lapply(D2, as.data.frame.list)
    Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE,  : 
      les arguments impliquent des nombres de lignes différents : 0, 1
    

Other trials:

rbind.fill(D2[[3]])

It has the same output as just

D2[[3]]

And when I try to take this last output and write it to csv, thinking it might be easier to handle, I get this:

> write.csv(D6, file = FlightStats.csv)_#D6=D2[[3]]_
Error in if (inherits(X[[j]], "data.frame") && ncol(xj) > 1L) X[[j]] <- as.matrix(X[[j]]) : 
  valeur manquante là où TRUE / FALSE est requis

Also tried melt, both on the whole list and on one element of the list, with the same result:

> melt(D2) _OR_ melt(D2[[3]]
Using carrierFsCode, flightNumber, departureAirportFsCode, arrivalAirportFsCode, status as id variables
Error in eval(substitute(expr), envir, enclos) : 
  Can't melt data.frames with non-atomic 'measure' columns
De plus : Warning message:
attributes are not identical across measure variables; they will be dropped

as.data.frame, rbindlist and stack also return error messages.

I've tried assigning one element of the list to one variable, and again with an other element of the list, and then combining these two (carefully chosen to have the same number of variable) using rbindand I still get errors.

> rbind(D4, D5)
Error in `row.names<-.data.frame`(`*tmp*`, value = value) : 
  les duplications dans 'row.names' ne sont pas autorisées
De plus : Warning message:
non-unique values when setting 'row.names': ‘1’, ‘2’ 

It's telling me the row names are duplicates, but :

    > rownames(D4)
    [1] "890867748" "889955650"
    > rownames(D5)
    [1] "891368844" "889954328" "889955975" "891364679" "891364678"

Basically, I need help! How do I get this mess in a data frame?

Here is some data that is representative (elements 8 to 10 of the list)

 > dput(D2[c(8,9,10)])
structure(list(flightStatuses = structure(list(flightId = 890465980L, 
    carrierFsCode = "6L", flightNumber = "406", departureAirportFsCode = "YUB", 
    arrivalAirportFsCode = "YEV", departureDate = structure(list(
        dateLocal = "2017-05-12T18:00:00.000", dateUtc = "2017-05-13T00:00:00.000Z"), .Names = c("dateLocal", 
    "dateUtc"), class = "data.frame", row.names = 1L), arrivalDate = structure(list(
        dateLocal = "2017-05-12T18:30:00.000", dateUtc = "2017-05-13T00:30:00.000Z"), .Names = c("dateLocal", 
    "dateUtc"), class = "data.frame", row.names = 1L), status = "U", 
    schedule = structure(list(flightType = "J", serviceClasses = "Y", 
        restrictions = ""), .Names = c("flightType", "serviceClasses", 
    "restrictions"), class = "data.frame", row.names = 1L), operationalTimes = structure(list(
        publishedDeparture = structure(list(dateLocal = "2017-05-12T18:00:00.000", 
            dateUtc = "2017-05-13T00:00:00.000Z"), .Names = c("dateLocal", 
        "dateUtc"), class = "data.frame", row.names = 1L), publishedArrival = structure(list(
            dateLocal = "2017-05-12T18:30:00.000", dateUtc = "2017-05-13T00:30:00.000Z"), .Names = c("dateLocal", 
        "dateUtc"), class = "data.frame", row.names = 1L), scheduledGateDeparture = structure(list(
            dateLocal = "2017-05-12T18:00:00.000", dateUtc = "2017-05-13T00:00:00.000Z"), .Names = c("dateLocal", 
        "dateUtc"), class = "data.frame", row.names = 1L), flightPlanPlannedDeparture = structure(list(
            dateLocal = "2017-05-12T18:16:00.000", dateUtc = "2017-05-13T00:16:00.000Z"), .Names = c("dateLocal", 
        "dateUtc"), class = "data.frame", row.names = 1L), scheduledGateArrival = structure(list(
            dateLocal = "2017-05-12T18:30:00.000", dateUtc = "2017-05-13T00:30:00.000Z"), .Names = c("dateLocal", 
        "dateUtc"), class = "data.frame", row.names = 1L), flightPlanPlannedArrival = structure(list(
            dateLocal = "2017-05-12T18:42:00.000", dateUtc = "2017-05-13T00:42:00.000Z"), .Names = c("dateLocal", 
        "dateUtc"), class = "data.frame", row.names = 1L)), .Names = c("publishedDeparture", 
    "publishedArrival", "scheduledGateDeparture", "flightPlanPlannedDeparture", 
    "scheduledGateArrival", "flightPlanPlannedArrival"), class = "data.frame", row.names = 1L), 
    flightDurations = structure(list(scheduledBlockMinutes = 30L, 
        scheduledAirMinutes = 26L, scheduledTaxiOutMinutes = 16L), .Names = c("scheduledBlockMinutes", 
    "scheduledAirMinutes", "scheduledTaxiOutMinutes"), class = "data.frame", row.names = 1L), 
    flightEquipment = structure(list(scheduledEquipmentIataCode = "DHT", 
        actualEquipmentIataCode = "DHT"), .Names = c("scheduledEquipmentIataCode", 
    "actualEquipmentIataCode"), class = "data.frame", row.names = 1L)), .Names = c("flightId", 
"carrierFsCode", "flightNumber", "departureAirportFsCode", "arrivalAirportFsCode", 
"departureDate", "arrivalDate", "status", "schedule", "operationalTimes", 
"flightDurations", "flightEquipment"), class = "data.frame", row.names = 1L), 
    flightStatuses = list(), flightStatuses = structure(list(
        flightId = c(892402226L, 891883063L), carrierFsCode = c("4K", 
        "6L"), flightNumber = c("201", "402"), departureAirportFsCode = c("YUB", 
        "YUB"), arrivalAirportFsCode = c("YEV", "YEV"), departureDate = structure(list(
            dateLocal = c("2017-05-13T09:30:00.000", "2017-05-13T10:30:00.000"
            ), dateUtc = c("2017-05-13T15:30:00.000Z", "2017-05-13T16:30:00.000Z"
            )), .Names = c("dateLocal", "dateUtc"), class = "data.frame", row.names = 1:2), 
        arrivalDate = structure(list(dateLocal = c("2017-05-13T11:42:00.000", 
        "2017-05-13T11:10:00.000"), dateUtc = c("2017-05-13T17:42:00.000Z", 
        "2017-05-13T17:10:00.000Z")), .Names = c("dateLocal", 
        "dateUtc"), class = "data.frame", row.names = 1:2), status = c("U", 
        "U"), operationalTimes = structure(list(scheduledGateDeparture = structure(list(
            dateLocal = c("2017-05-13T09:30:00.000", "2017-05-13T10:30:00.000"
            ), dateUtc = c("2017-05-13T15:30:00.000Z", "2017-05-13T16:30:00.000Z"
            )), .Names = c("dateLocal", "dateUtc"), class = "data.frame", row.names = 1:2), 
            estimatedRunwayDeparture = structure(list(dateLocal = c("2017-05-13T11:19:00.000", 
            NA), dateUtc = c("2017-05-13T17:19:00.000Z", NA)), .Names = c("dateLocal", 
            "dateUtc"), class = "data.frame", row.names = 1:2), 
            actualRunwayDeparture = structure(list(dateLocal = c("2017-05-13T11:19:00.000", 
            NA), dateUtc = c("2017-05-13T17:19:00.000Z", NA)), .Names = c("dateLocal", 
            "dateUtc"), class = "data.frame", row.names = 1:2), 
            estimatedRunwayArrival = structure(list(dateLocal = c("2017-05-13T11:42:00.000", 
            NA), dateUtc = c("2017-05-13T17:42:00.000Z", NA)), .Names = c("dateLocal", 
            "dateUtc"), class = "data.frame", row.names = 1:2), 
            flightPlanPlannedDeparture = structure(list(dateLocal = c(NA, 
            "2017-05-13T10:45:00.000"), dateUtc = c(NA, "2017-05-13T16:45:00.000Z"
            )), .Names = c("dateLocal", "dateUtc"), class = "data.frame", row.names = 1:2), 
            flightPlanPlannedArrival = structure(list(dateLocal = c(NA, 
            "2017-05-13T11:10:00.000"), dateUtc = c(NA, "2017-05-13T17:10:00.000Z"
            )), .Names = c("dateLocal", "dateUtc"), class = "data.frame", row.names = 1:2)), .Names = c("scheduledGateDeparture", 
        "estimatedRunwayDeparture", "actualRunwayDeparture", 
        "estimatedRunwayArrival", "flightPlanPlannedDeparture", 
        "flightPlanPlannedArrival"), class = "data.frame", row.names = 1:2), 
        flightEquipment = structure(list(actualEquipmentIataCode = c("BE2", 
        "DHT")), .Names = "actualEquipmentIataCode", class = "data.frame", row.names = 1:2), 
        flightDurations = structure(list(scheduledAirMinutes = c(NA, 
        25L), scheduledTaxiOutMinutes = c(NA, 15L)), .Names = c("scheduledAirMinutes", 
        "scheduledTaxiOutMinutes"), class = "data.frame", row.names = 1:2)), .Names = c("flightId", 
    "carrierFsCode", "flightNumber", "departureAirportFsCode", 
    "arrivalAirportFsCode", "departureDate", "arrivalDate", "status", 
    "operationalTimes", "flightEquipment", "flightDurations"), class = "data.frame", row.names = 1:2)), .Names = c("flightStatuses", 
"flightStatuses", "flightStatuses"))
Community
  • 1
  • 1
lcabral
  • 25
  • 7
  • 1
    how do you want the data in the list to be represented in the dataframe if the nested dataframes have different columns? – yeedle May 18 '17 at 21:53
  • It looks like a lot of your problems come from the fact that a few of the items aren't data frames. Have you tried `D <- Filter(D2, is.data.frame)` (to get only the data frame elements) and then `dplyr::bind_rows(D)`? – David Robinson May 18 '17 at 22:00
  • @ yeedle When I isolate one element of the list, R considers it a data frame and when I use `view(D2[[3]]`, I can see everything. For example, the subelement departureDate is a data frame with two observations and R creates two columns: "departureDate.dateLocal" and "departureDate.dateUtc". This is what I expect to be the response overall. It does appear that the structure is still nested data frames. Could it be that `view` is just making it easy to see? How would I go about actually creating those columns as opposed to only visualizing them? I thought `rbind` did that, but it doesn't. – lcabral May 18 '17 at 22:03
  • Do you have a program that is creating such objects? It looks like difficulties I have seen with JSON-readers. – IRTFM May 18 '17 at 22:34
  • @ David Robinson I tried using `list.filter` because `filter`alone gave me an error saying that the object couldn't be converted automatically to type 'double'. The `list.filter`came back with a list of 0 elements. I'd have to look further into it, because I think you're on to something, but for now, negative result. – lcabral May 19 '17 at 00:00
  • @42- The data comes from an API from FlightStats. It is JSON. – lcabral May 19 '17 at 00:01
  • Search on : "[r] read dataframe nested lists JSON" : http://stackoverflow.com/questions/35444968/read-json-file-into-a-data-frame-without-nested-lists – IRTFM May 19 '17 at 00:15
  • @42 I tried it and I get this error and Traceback: Erreur : is.data.frame(x) is not TRUE 4. stop(sprintf(ngettext(length(r), "%s is not TRUE", "%s are not all TRUE"), ch), call. = FALSE, domain = NA) 3. stopifnot(is.data.frame(x)) 2. flatten(indf) 1. Flattener(D2, TRUE) I looks like my problems revolve around is.data.frame again. I'm not sure what I should do... Do I leave my question up? – lcabral May 19 '17 at 14:32
  • You tried "it"? I don't see what that preposition refers to. – IRTFM May 19 '17 at 16:16
  • Ok, I got it, @David Robinson was right. This is my final code: `cond<-lapply(D2,is.data.frame) D3<-D2[unlist(cond)] D4<-lapply(D3, flatten) df<-bind_rows(D4)` – lcabral May 19 '17 at 16:16
  • @42, sorry, I wasn't very clear. I tried using the col_fixer and Flattener functions that were proposed in the post you referred to. I tried using my list of data frames(D2), and also only on one element of the list and both gave me the error I copied in my last comment. – lcabral May 19 '17 at 16:19

0 Answers0