This seems to be a recurring question, but I have been reading StackOverflow for hours and can't find a solution, so here it is:
I have a list of 28 elements called D2 of which this is the general structure. It is flight data, so the number of observations in each element and the number of variables per element differ.
> str(D2, max.level = 1)
List of 28
$ flightStatuses: list()
$ flightStatuses: list()
$ flightStatuses:'data.frame': 5 obs. of 12 variables:
$ flightStatuses:'data.frame': 4 obs. of 12 variables:
$ flightStatuses:'data.frame': 1 obs. of 11 variables:
$ flightStatuses:'data.frame': 3 obs. of 12 variables:
$ flightStatuses:'data.frame': 10 obs. of 15 variables:
$ flightStatuses:'data.frame': 1 obs. of 12 variables:
$ flightStatuses: list()
$ flightStatuses:'data.frame': 2 obs. of 11 variables:
etc.
I am trying to get the contents into a data frame and then save it to csv.
Here is the structure of the third element of the list, as an example:
> str(D2[[3]])
'data.frame': 5 obs. of 12 variables:
$ flightId : int 891368844 889954328 889955975 891364679 891364678
$ carrierFsCode : chr "4K" "4N" "5T" "6L" ...
$ flightNumber : chr "901" "207" "444" "414" ...
$ departureAirportFsCode: chr "ZFM" "YDA" "YVQ" "ZFM" ...
$ arrivalAirportFsCode : chr "YEV" "YEV" "YEV" "YEV" ...
$ departureDate :'data.frame': 5 obs. of 2 variables:
..$ dateLocal: chr "2017-05-11T09:00:00.000" "2017-05-11T09:55:00.000" "2017-05-11T12:30:00.000" "2017-05-11T16:00:00.000" ...
..$ dateUtc : chr "2017-05-11T15:00:00.000Z" "2017-05-11T16:55:00.000Z" "2017-05-11T18:30:00.000Z" "2017-05-11T22:00:00.000Z" ...
$ arrivalDate :'data.frame': 5 obs. of 2 variables:
..$ dateLocal: chr "2017-05-11T15:45:00.000" "2017-05-11T12:10:00.000" "2017-05-11T13:28:00.000" "2017-05-11T16:37:00.000" ...
..$ dateUtc : chr "2017-05-11T21:45:00.000Z" "2017-05-11T18:10:00.000Z" "2017-05-11T19:28:00.000Z" "2017-05-11T22:37:00.000Z" ...
$ status : chr "L" "U" "L" "U" ...
$ operationalTimes :'data.frame': 5 obs. of 14 variables:
..$ scheduledGateDeparture :'data.frame': 5 obs. of 2 variables:
.. ..$ dateLocal: chr "2017-05-11T09:00:00.000" "2017-05-11T09:55:00.000" "2017-05-11T12:30:00.000" "2017-05-11T16:00:00.000" ...
.. ..$ dateUtc : chr "2017-05-11T15:00:00.000Z" "2017-05-11T16:55:00.000Z" "2017-05-11T18:30:00.000Z" "2017-05-11T22:00:00.000Z" ...
..$ estimatedRunwayDeparture :'data.frame': 5 obs. of 2 variables:
.. ..$ dateLocal: chr "2017-05-11T15:24:00.000" "2017-05-11T09:53:00.000" "2017-05-11T12:27:00.000" NA ...
.. ..$ dateUtc : chr "2017-05-11T21:24:00.000Z" "2017-05-11T16:53:00.000Z" "2017-05-11T18:27:00.000Z" NA ...
..$ actualRunwayDeparture :'data.frame': 5 obs. of 2 variables:
.. ..$ dateLocal: chr "2017-05-11T15:24:00.000" "2017-05-11T09:53:00.000" "2017-05-11T12:27:00.000" NA ...
.. ..$ dateUtc : chr "2017-05-11T21:24:00.000Z" "2017-05-11T16:53:00.000Z" "2017-05-11T18:27:00.000Z" NA ...
..$ estimatedRunwayArrival :'data.frame': 5 obs. of 2 variables:
.. ..$ dateLocal: chr "2017-05-11T15:45:00.000" "2017-05-11T12:11:00.000" "2017-05-11T13:12:00.000" NA ...
.. ..$ dateUtc : chr "2017-05-11T21:45:00.000Z" "2017-05-11T18:11:00.000Z" "2017-05-11T19:12:00.000Z" NA ...
..$ actualRunwayArrival :'data.frame': 5 obs. of 2 variables:
.. ..$ dateLocal: chr "2017-05-11T15:45:00.000" NA "2017-05-11T13:12:00.000" NA ...
.. ..$ dateUtc : chr "2017-05-11T21:45:00.000Z" NA "2017-05-11T19:12:00.000Z" NA ...
..$ publishedDeparture :'data.frame': 5 obs. of 2 variables:
.. ..$ dateLocal: chr NA "2017-05-11T09:55:00.000" "2017-05-11T12:30:00.000" NA ...
.. ..$ dateUtc : chr NA "2017-05-11T16:55:00.000Z" "2017-05-11T18:30:00.000Z" NA ...
..$ publishedArrival :'data.frame': 5 obs. of 2 variables:
.. ..$ dateLocal: chr NA "2017-05-11T12:10:00.000" "2017-05-11T13:28:00.000" NA ...
.. ..$ dateUtc : chr NA "2017-05-11T18:10:00.000Z" "2017-05-11T19:28:00.000Z" NA ...
..$ flightPlanPlannedDeparture:'data.frame': 5 obs. of 2 variables:
.. ..$ dateLocal: chr NA "2017-05-11T10:05:00.000" "2017-05-11T12:40:00.000" "2017-05-11T16:15:00.000" ...
.. ..$ dateUtc : chr NA "2017-05-11T17:05:00.000Z" "2017-05-11T18:40:00.000Z" "2017-05-11T22:15:00.000Z" ...
..$ scheduledGateArrival :'data.frame': 5 obs. of 2 variables:
.. ..$ dateLocal: chr NA "2017-05-11T12:10:00.000" "2017-05-11T13:28:00.000" NA ...
.. ..$ dateUtc : chr NA "2017-05-11T18:10:00.000Z" "2017-05-11T19:28:00.000Z" NA ...
..$ flightPlanPlannedArrival :'data.frame': 5 obs. of 2 variables:
.. ..$ dateLocal: chr NA "2017-05-11T12:25:00.000" "2017-05-11T13:24:00.000" "2017-05-11T16:37:00.000" ...
.. ..$ dateUtc : chr NA "2017-05-11T18:25:00.000Z" "2017-05-11T19:24:00.000Z" "2017-05-11T22:37:00.000Z" ...
..$ estimatedGateDeparture :'data.frame': 5 obs. of 2 variables:
.. ..$ dateLocal: chr NA NA "2017-05-11T12:20:00.000" NA ...
.. ..$ dateUtc : chr NA NA "2017-05-11T18:20:00.000Z" NA ...
..$ actualGateDeparture :'data.frame': 5 obs. of 2 variables:
.. ..$ dateLocal: chr NA NA "2017-05-11T12:20:00.000" NA ...
.. ..$ dateUtc : chr NA NA "2017-05-11T18:20:00.000Z" NA ...
..$ estimatedGateArrival :'data.frame': 5 obs. of 2 variables:
.. ..$ dateLocal: chr NA NA "2017-05-11T13:15:00.000" NA ...
.. ..$ dateUtc : chr NA NA "2017-05-11T19:15:00.000Z" NA ...
..$ actualGateArrival :'data.frame': 5 obs. of 2 variables:
.. ..$ dateLocal: chr NA NA "2017-05-11T13:15:00.000" NA ...
.. ..$ dateUtc : chr NA NA "2017-05-11T19:15:00.000Z" NA ...
$ flightDurations :'data.frame': 5 obs. of 8 variables:
..$ airMinutes : int 21 NA 45 NA NA
..$ scheduledBlockMinutes : int NA 75 58 NA NA
..$ scheduledAirMinutes : int NA 80 44 22 26
..$ scheduledTaxiOutMinutes: int NA 10 10 15 15
..$ blockMinutes : int NA NA 55 NA NA
..$ taxiOutMinutes : int NA NA 7 NA NA
..$ scheduledTaxiInMinutes : int NA NA 4 NA NA
..$ taxiInMinutes : int NA NA 3 NA NA
$ flightEquipment :'data.frame': 5 obs. of 2 variables:
..$ actualEquipmentIataCode : chr "BE1" "HS7" "733" "DHT" ...
..$ scheduledEquipmentIataCode: chr NA "HS7" "733" NA ...
$ schedule :'data.frame': 5 obs. of 4 variables:
..$ flightType : chr NA "J" "J" NA ...
..$ serviceClasses: chr NA "RY" "RFJY" NA ...
..$ restrictions : chr NA "" "" NA ...
..$ uplines :List of 5
.. ..$ : NULL
.. ..$ :'data.frame': 1 obs. of 2 variables:
.. .. ..$ fsCode : chr "YXY"
.. .. ..$ flightId: int 889956597
.. ..$ :'data.frame': 2 obs. of 2 variables:
.. .. ..$ fsCode : chr "YEG" "YZF"
.. .. ..$ flightId: int 889954472 889957614
.. ..$ : NULL
.. ..$ : NULL
As you can see, there are multiple data frames within each data frame that is an element of the list. I have read, amongst others, these posts to try and get all of this into a data frame.
R list to data frame But this one has lists of equal length
https://www.r-bloggers.com/concatenating-a-list-of-data-frames/ But even when I try it on an isolated element of the list, like the sample above, I get errors such as these:
df<-ldply(D2[[3]], rbind) Error in list_to_dataframe(res, attr(.data, "split_labels"), .id, id_as_factor) : Results must be all atomic, or all data frames > df<-do.call(rbind, D2[[3]]) Error in rbind(deparse.level, ...) : les nombres de colonnes des arguments ne correspondent pas _Number of columns doesn't correspond|
Extracting from Nested list to data frame This one seems promising, but just too complex in the way it is explained. I'm a beginner in R, so I need something with a bit more human language.
Converting nested list (unequal length) to data frame This one is with named vectors and not data frames. When I try solution from @MrFlick, I get this:
> df <- rbind.fill(lapply(D2, function(x)as.data.frame(t(x)))) Error in if (inherits(X[[j]], "data.frame") && ncol(xj) > 1L) X[[j]] <- as.matrix(X[[j]]) : valeur manquante là où TRUE / FALSE est requis Called from: as.matrix.data.frame(x)
Converting nested list (unequal length) to data frame When I try @akrun's answer, I get:
> indx<-lengths(D2) > res<-as.data.frame(do.call(rbind, lapply(D2, `length<-`,max(indx)))) Error in rbind(deparse.level, ...) : liste d'arguments incorrecte : toutes les variables doivent avoir la même longueur > colnames(res)<-names(D2[[which.max(indx)]])
List elements to dataframes in R When I try answer by @David Arenburg:
> lapply(D2, as.data.frame.list) Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : les arguments impliquent des nombres de lignes différents : 0, 1
Other trials:
rbind.fill(D2[[3]])
It has the same output as just
D2[[3]]
And when I try to take this last output and write it to csv, thinking it might be easier to handle, I get this:
> write.csv(D6, file = FlightStats.csv)_#D6=D2[[3]]_
Error in if (inherits(X[[j]], "data.frame") && ncol(xj) > 1L) X[[j]] <- as.matrix(X[[j]]) :
valeur manquante là où TRUE / FALSE est requis
Also tried melt
, both on the whole list and on one element of the list, with the same result:
> melt(D2) _OR_ melt(D2[[3]]
Using carrierFsCode, flightNumber, departureAirportFsCode, arrivalAirportFsCode, status as id variables
Error in eval(substitute(expr), envir, enclos) :
Can't melt data.frames with non-atomic 'measure' columns
De plus : Warning message:
attributes are not identical across measure variables; they will be dropped
as.data.frame
, rbindlist
and stack
also return error messages.
I've tried assigning one element of the list to one variable, and again with an other element of the list, and then combining these two (carefully chosen to have the same number of variable) using rbind
and I still get errors.
> rbind(D4, D5)
Error in `row.names<-.data.frame`(`*tmp*`, value = value) :
les duplications dans 'row.names' ne sont pas autorisées
De plus : Warning message:
non-unique values when setting 'row.names': ‘1’, ‘2’
It's telling me the row names are duplicates, but :
> rownames(D4)
[1] "890867748" "889955650"
> rownames(D5)
[1] "891368844" "889954328" "889955975" "891364679" "891364678"
Basically, I need help! How do I get this mess in a data frame?
Here is some data that is representative (elements 8 to 10 of the list)
> dput(D2[c(8,9,10)])
structure(list(flightStatuses = structure(list(flightId = 890465980L,
carrierFsCode = "6L", flightNumber = "406", departureAirportFsCode = "YUB",
arrivalAirportFsCode = "YEV", departureDate = structure(list(
dateLocal = "2017-05-12T18:00:00.000", dateUtc = "2017-05-13T00:00:00.000Z"), .Names = c("dateLocal",
"dateUtc"), class = "data.frame", row.names = 1L), arrivalDate = structure(list(
dateLocal = "2017-05-12T18:30:00.000", dateUtc = "2017-05-13T00:30:00.000Z"), .Names = c("dateLocal",
"dateUtc"), class = "data.frame", row.names = 1L), status = "U",
schedule = structure(list(flightType = "J", serviceClasses = "Y",
restrictions = ""), .Names = c("flightType", "serviceClasses",
"restrictions"), class = "data.frame", row.names = 1L), operationalTimes = structure(list(
publishedDeparture = structure(list(dateLocal = "2017-05-12T18:00:00.000",
dateUtc = "2017-05-13T00:00:00.000Z"), .Names = c("dateLocal",
"dateUtc"), class = "data.frame", row.names = 1L), publishedArrival = structure(list(
dateLocal = "2017-05-12T18:30:00.000", dateUtc = "2017-05-13T00:30:00.000Z"), .Names = c("dateLocal",
"dateUtc"), class = "data.frame", row.names = 1L), scheduledGateDeparture = structure(list(
dateLocal = "2017-05-12T18:00:00.000", dateUtc = "2017-05-13T00:00:00.000Z"), .Names = c("dateLocal",
"dateUtc"), class = "data.frame", row.names = 1L), flightPlanPlannedDeparture = structure(list(
dateLocal = "2017-05-12T18:16:00.000", dateUtc = "2017-05-13T00:16:00.000Z"), .Names = c("dateLocal",
"dateUtc"), class = "data.frame", row.names = 1L), scheduledGateArrival = structure(list(
dateLocal = "2017-05-12T18:30:00.000", dateUtc = "2017-05-13T00:30:00.000Z"), .Names = c("dateLocal",
"dateUtc"), class = "data.frame", row.names = 1L), flightPlanPlannedArrival = structure(list(
dateLocal = "2017-05-12T18:42:00.000", dateUtc = "2017-05-13T00:42:00.000Z"), .Names = c("dateLocal",
"dateUtc"), class = "data.frame", row.names = 1L)), .Names = c("publishedDeparture",
"publishedArrival", "scheduledGateDeparture", "flightPlanPlannedDeparture",
"scheduledGateArrival", "flightPlanPlannedArrival"), class = "data.frame", row.names = 1L),
flightDurations = structure(list(scheduledBlockMinutes = 30L,
scheduledAirMinutes = 26L, scheduledTaxiOutMinutes = 16L), .Names = c("scheduledBlockMinutes",
"scheduledAirMinutes", "scheduledTaxiOutMinutes"), class = "data.frame", row.names = 1L),
flightEquipment = structure(list(scheduledEquipmentIataCode = "DHT",
actualEquipmentIataCode = "DHT"), .Names = c("scheduledEquipmentIataCode",
"actualEquipmentIataCode"), class = "data.frame", row.names = 1L)), .Names = c("flightId",
"carrierFsCode", "flightNumber", "departureAirportFsCode", "arrivalAirportFsCode",
"departureDate", "arrivalDate", "status", "schedule", "operationalTimes",
"flightDurations", "flightEquipment"), class = "data.frame", row.names = 1L),
flightStatuses = list(), flightStatuses = structure(list(
flightId = c(892402226L, 891883063L), carrierFsCode = c("4K",
"6L"), flightNumber = c("201", "402"), departureAirportFsCode = c("YUB",
"YUB"), arrivalAirportFsCode = c("YEV", "YEV"), departureDate = structure(list(
dateLocal = c("2017-05-13T09:30:00.000", "2017-05-13T10:30:00.000"
), dateUtc = c("2017-05-13T15:30:00.000Z", "2017-05-13T16:30:00.000Z"
)), .Names = c("dateLocal", "dateUtc"), class = "data.frame", row.names = 1:2),
arrivalDate = structure(list(dateLocal = c("2017-05-13T11:42:00.000",
"2017-05-13T11:10:00.000"), dateUtc = c("2017-05-13T17:42:00.000Z",
"2017-05-13T17:10:00.000Z")), .Names = c("dateLocal",
"dateUtc"), class = "data.frame", row.names = 1:2), status = c("U",
"U"), operationalTimes = structure(list(scheduledGateDeparture = structure(list(
dateLocal = c("2017-05-13T09:30:00.000", "2017-05-13T10:30:00.000"
), dateUtc = c("2017-05-13T15:30:00.000Z", "2017-05-13T16:30:00.000Z"
)), .Names = c("dateLocal", "dateUtc"), class = "data.frame", row.names = 1:2),
estimatedRunwayDeparture = structure(list(dateLocal = c("2017-05-13T11:19:00.000",
NA), dateUtc = c("2017-05-13T17:19:00.000Z", NA)), .Names = c("dateLocal",
"dateUtc"), class = "data.frame", row.names = 1:2),
actualRunwayDeparture = structure(list(dateLocal = c("2017-05-13T11:19:00.000",
NA), dateUtc = c("2017-05-13T17:19:00.000Z", NA)), .Names = c("dateLocal",
"dateUtc"), class = "data.frame", row.names = 1:2),
estimatedRunwayArrival = structure(list(dateLocal = c("2017-05-13T11:42:00.000",
NA), dateUtc = c("2017-05-13T17:42:00.000Z", NA)), .Names = c("dateLocal",
"dateUtc"), class = "data.frame", row.names = 1:2),
flightPlanPlannedDeparture = structure(list(dateLocal = c(NA,
"2017-05-13T10:45:00.000"), dateUtc = c(NA, "2017-05-13T16:45:00.000Z"
)), .Names = c("dateLocal", "dateUtc"), class = "data.frame", row.names = 1:2),
flightPlanPlannedArrival = structure(list(dateLocal = c(NA,
"2017-05-13T11:10:00.000"), dateUtc = c(NA, "2017-05-13T17:10:00.000Z"
)), .Names = c("dateLocal", "dateUtc"), class = "data.frame", row.names = 1:2)), .Names = c("scheduledGateDeparture",
"estimatedRunwayDeparture", "actualRunwayDeparture",
"estimatedRunwayArrival", "flightPlanPlannedDeparture",
"flightPlanPlannedArrival"), class = "data.frame", row.names = 1:2),
flightEquipment = structure(list(actualEquipmentIataCode = c("BE2",
"DHT")), .Names = "actualEquipmentIataCode", class = "data.frame", row.names = 1:2),
flightDurations = structure(list(scheduledAirMinutes = c(NA,
25L), scheduledTaxiOutMinutes = c(NA, 15L)), .Names = c("scheduledAirMinutes",
"scheduledTaxiOutMinutes"), class = "data.frame", row.names = 1:2)), .Names = c("flightId",
"carrierFsCode", "flightNumber", "departureAirportFsCode",
"arrivalAirportFsCode", "departureDate", "arrivalDate", "status",
"operationalTimes", "flightEquipment", "flightDurations"), class = "data.frame", row.names = 1:2)), .Names = c("flightStatuses",
"flightStatuses", "flightStatuses"))