I am trying to make a dataset of railway timetable data usable for a project. The raw data has the following structure:
#00000001
%100,11410, ,001,002,Nachtnettrein
%100,01412, ,002,007,Nachtnettrein
-00002,000,999
&IC ,001,007
>rtd ,2532
?11 ,11 ,00002
;rtn
;rta
;cps
;nwk
+gd ,2550,2557
?5 ,5b ,00002
;ztmo
;ztm
;ypb
;vb
;gvc
+gv ,2616,2623
?6 ,6 ,00002
;laa
;gvm
;vst
;dvnk
+ledn ,2635,2645
?5b ,5b ,00002
;ssh
;nvp
;hfd
+shl ,2700,2702
?1-2 ,1-2 ,00002
;asdl
;ass
+asd ,2716,2719
?11a ,11a ,00002
;asdm
;asa
;dvd
;asb
;ashd
;ac
;bkl
;mas
;utzl
<ut ,2758
?5 ,5 ,00002
#00000002
%100,11410, ,001,002,Nachtnettrein
I am interested in separating this data based on trip number. Each trip number starts with an #. I use the following code to separate by trip number:
DF <- read.delim("timetbls.dat", sep="#", skip=1)
This produces the following result:
> head(DF)
X X00000001
1 %100,11410, ,001,002,Nachtnettrein 1
2 %100,01412, ,002,007,Nachtnettrein NA
3 -00002,000,999 NA
4 &IC ,001,007 NA
5 >rtd ,2532 NA
6 ?11 ,11 ,00002 NA
Where column X contains some of the data for that particular trip and column X00000001 contains the trip number. The trip number is only shown for the first line of each trip. The rest of the values are NA until another trip number is detected. Now I want to merge the rows that contain the same trip. As a first step I tried creating a for loop that fills every NA with it's corresponding trip number, so I could perhaps group by this trip number later. The problem is that this for loop is painstakingly slow for a million-plus rows dataset.
DF$X00000001[1] <- 1 #Initializing first number, which is normally missing out
DF$Trip[1] <-1 #Bruteforcing first trip number
for(i in dim(DF)[1] {
ifelse(DF$X00000001[i] == NA, DF$Trip[i] <- DF$Trip[i-1], DF$Trip[i] <-DF$X00000001[i])
}
I am looking for a replacement of the for loop or any other method to gather the data belonging to one trip in a single row. Thanks!