0

I have a data.frame that looks like the below, but need to transform it. I don't have an issue getting to the second step (gather), but am struggling to get the third. How can I get R to fill in the missing rows?

CURRENT DATA (FIRST STEP):

      poe   pod q20 q80 missing_rows
 1: GTSTC NLBZM  25  33            7
 2: CNSHA HKHKG  13  18            4

TRANSFORMED DATA (SECOND STEP):

poe pod transit
GTSTC   NLBZM   25
GTSTC   NLBZM   33
CNSHA   HKHKG   13
CNSHA   HKHKG   18

DESIRED DATA:

  poe     pod   transit
GTSTC   NLBZM   25
GTSTC   NLBZM   26
GTSTC   NLBZM   27
GTSTC   NLBZM   28
GTSTC   NLBZM   29
GTSTC   NLBZM   30
GTSTC   NLBZM   31
GTSTC   NLBZM   32
GTSTC   NLBZM   33
CNSHA   HKHKG   13
CNSHA   HKHKG   14
CNSHA   HKHKG   15
CNSHA   HKHKG   16
CNSHA   HKHKG   17
CNSHA   HKHKG   18
markus
  • 25,843
  • 5
  • 39
  • 58
Josh Mark
  • 1
  • 3
  • Take a look at `tidyr::complete`. – markus Aug 29 '18 at 19:43
  • 1
    Please show your code and a **reproducible** example of your data using `dput`. See this question [How to make a greatR Reproducible Example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example?rq=1) if you need help – divibisan Aug 29 '18 at 19:43

2 Answers2

1

We can use expand and full_seq from tidyr:

library(dplyr)
library(tidyr)

df %>%
  gather(var, transit, q20, q80) %>%
  group_by(poe, pod) %>%
  expand(transit = full_seq(transit, 1))

Result:

# A tibble: 15 x 3
# Groups:   poe, pod [2]
   poe   pod   transit
   <fct> <fct>   <dbl>
 1 CNSHA HKHKG      13
 2 CNSHA HKHKG      14
 3 CNSHA HKHKG      15
 4 CNSHA HKHKG      16
 5 CNSHA HKHKG      17
 6 CNSHA HKHKG      18
 7 GTSTC NLBZM      25
 8 GTSTC NLBZM      26
 9 GTSTC NLBZM      27
10 GTSTC NLBZM      28
11 GTSTC NLBZM      29
12 GTSTC NLBZM      30
13 GTSTC NLBZM      31
14 GTSTC NLBZM      32
15 GTSTC NLBZM      33

Data:

df <- structure(list(poe = structure(c(2L, 1L), .Label = c("CNSHA", 
"GTSTC"), class = "factor"), pod = structure(c(2L, 1L), .Label = c("HKHKG", 
"NLBZM"), class = "factor"), q20 = c(25L, 13L), q80 = c(33L, 
18L)), .Names = c("poe", "pod", "q20", "q80"), class = "data.frame", row.names = c(NA, 
-2L))
acylam
  • 18,231
  • 5
  • 36
  • 45
0

Map the start and end points, and join back to your original data. The intermediate 'transformed data' should not be necessary:

sq <- Map(seq, dat$q20, dat$q80)
cbind(
    dat[rep(seq_along(sq),lengths(sq)),c("poe","pod")],
    transit=unlist(sq)
)

#      poe   pod transit
#1   GTSTC NLBZM      25
#1.1 GTSTC NLBZM      26
#1.2 GTSTC NLBZM      27
#1.3 GTSTC NLBZM      28
#1.4 GTSTC NLBZM      29
#1.5 GTSTC NLBZM      30
#1.6 GTSTC NLBZM      31
#1.7 GTSTC NLBZM      32
#1.8 GTSTC NLBZM      33
#2   CNSHA HKHKG      13
#2.1 CNSHA HKHKG      14
#2.2 CNSHA HKHKG      15
#2.3 CNSHA HKHKG      16
#2.4 CNSHA HKHKG      17
#2.5 CNSHA HKHKG      18

Where dat was your original 'first step' dataset:

dat <- read.table(text="poe pod q20 q80
GTSTC NLBZM 25 33
CNSHA HKHKG 13 18", header=TRUE)
thelatemail
  • 91,185
  • 12
  • 128
  • 188