0

I am analysing a dataframe with traintrips. The data is formatted as such:

 tripnumber stop       
<int> <list>     
1 <list [34]>
2 <list [34]>
3 <list [33]>
4 <list [20]>
5 <list [17]>
6 <list [17]>

Each tripnumber is connected to a certain amount of stops, trip 1 has for example 34 stops.

An important note is that the stop lists are not lists of just stations, but these are formatted as another lists with station + information (let's call these stationlists), structured like this:

list(Station = "ams", Arival_time = "0135", Departure_time = "0138", Index = "1")

I want to have the list of stationlists unlisted with in the first column after the tripnumber the first stationlist, in the second column the second stationlist, etc. such that it should look like below:

 tripnumber stop1 stop2 stop3 stop4 stop5 .... 
<int> <list>     
1 <list [4]> <list [4]> <list [4]> <list [4]> <list [4]> ....
2 <list [4]> <list [4]> <list [4]> <list [4]> <list [4]> ....
3 <list [4]> <list [4]> <list [4]> <list [4]> <list [4]> ....
4 <list [4]> <list [4]> <list [4]> <list [4]> <list [4]> ....
5 <list [4]> <list [4]> <list [4]> <list [4]> <list [4]> ....
6 <list [4]> <list [4]> <list [4]> <list [4]> <list [4]> ....

I tried to format this with the purrr library. However I am not too familiar with this package and the difficulty is that I cannot get this working without losing the tripnumber structure, or the "stationlist" structure.

Any tips how to solve this?

Edits:

  • The following dput(head(traintrips) can be copy pasted to R as testfile: .txt file
  • If there are more stop columns than actual stops, the cell should remain empty (" ")
Menn0
  • 35
  • 4
  • 2
    Hi and welcome to SO! Are the stops named? It would be really helpful if you can provide some [reproducible data](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) since that makes it a lot easier to test and verify solutions. Just a few rows with a few stops per row seems like it would be sufficient. – Calum You Oct 22 '18 at 17:45
  • Also, what do you want to have in (say) output column `stop34` for trip 3, i.e. how to deal with different numbers of stops per trip? – Calum You Oct 22 '18 at 17:46
  • I second @CalumYou's comment, please include at a minimum `dput(head(x))` – r2evans Oct 22 '18 at 18:13
  • Thanks @CalumYou, a testfile is added and can be downloaded – Menn0 Oct 22 '18 at 18:31
  • Check out `tidyr::unnest()` and `nest()`. – Joe Oct 22 '18 at 22:11

1 Answers1

0

Got it working by unnesting and additionaly reshape the result by using the following code:

DFnew <- unnest(traintrips, traintrips$stop) 
DFnew$time <- with(DFnew, ave(tripnumber, tripnumber, FUN = seq_along)) # add time column
names(DFnew)[2] <- paste("stop") # to remove the dollar sign from the colname of the unnested data
DFnew <- spread(DFnew, time, stop)

With as result:

> dim(DFnew)
[1]  6 35

> head(DFnew[,1:6])
# A tibble: 6 x 6
  tripnumber `1`        `2`        `3`        `4`        `5`       
       <int> <list>     <list>     <list>     <list>     <list>    
1          1 <list [4]> <list [4]> <list [4]> <list [4]> <list [4]>
2          2 <list [4]> <list [4]> <list [4]> <list [4]> <list [4]>
3          3 <list [4]> <list [4]> <list [4]> <list [4]> <list [4]>
4          4 <list [4]> <list [4]> <list [4]> <list [4]> <list [4]>
5          5 <list [4]> <list [4]> <list [4]> <list [4]> <list [4]>
6          6 <list [4]> <list [4]> <list [4]> <list [4]> <list [4]>
Menn0
  • 35
  • 4