1

I have a dataframe that I need to reshape to facilitate its use in a viz application. Here's a condensed version of the dataframe:

Carrier <- c("Mesa", "United", "JetBlue", "ExpressJet", "SkyWest")
Flight_Num <- c(7124, 7177, 334, 1223, 6380)
Origin <- c("ORD", "EWR", "SFO", "BOS", "BDL")
Dest <- c("PIT", "BOI", "DSM", "CWA", "CMH")
Sched_Depr <- c(1955, 1900, 1845, 1253, 1755)

df <- data.frame(Carrier, Flight_Num, Origin, Dest, Sched_Depr)

     Carrier Flight_Num Origin Dest Sched_Depr
1       Mesa       7124    ORD  PIT       1955
2     United       7177    EWR  BOI       1900
3    JetBlue        334    SFO  DSM       1845
4 ExpressJet       1223    BOS  CWA       1253
5    SkyWest       6380    BDL  CMH       1755

Origin and Dept are interpreted as geographical data (i.e. coordinates) by the viz application. I need to collate them in a single column named Coords. At the same time I need to create a new order marker variable Order_Points. So the new, reshaped dataframe would look like this:

      Carrier Flight_Num Coords Sched_Depr Order_Points
1        Mesa       7124    ORD       1955            1
2        Mesa       7124    PIT       1955            2
3      United       7177    EWR       1900            1
4      United       7177    BOI       1900            2
5     JetBlue        334    SFO       1845            1
6     JetBlue        334    DSM       1845            2
7  ExpressJet       1223    BOS       1253            1
8  ExpressJet       1223    CWA       1253            2
9     SkyWest       6380    BDL       1755            1
10    SkyWest       6380    CMH       1755            2

What would be an efficient way to collate two columns like this while keeping (and duplicating) the other variables?

Conner M.
  • 1,954
  • 3
  • 19
  • 29

2 Answers2

2

Here's an option using tidyverse functions. We use gather to convert the data frame from "wide" to "long" format. This also adds a column (called Type here) that marks whether a Coords is an Origin or Dest.

library(tidyverse)

df.long = df %>% 
  gather(Type, Coords, Origin, Dest) %>% 
  arrange(Carrier, desc(Type))
      Carrier Flight_Num Sched_Depr   Type Coords
1  ExpressJet       1223       1253 Origin    BOS
2  ExpressJet       1223       1253   Dest    CWA
3     JetBlue        334       1845 Origin    SFO
4     JetBlue        334       1845   Dest    DSM
5        Mesa       7124       1955 Origin    ORD
6        Mesa       7124       1955   Dest    PIT
7     SkyWest       6380       1755 Origin    BDL
8     SkyWest       6380       1755   Dest    CMH
9      United       7177       1900 Origin    EWR
10     United       7177       1900   Dest    BOI
eipi10
  • 91,525
  • 24
  • 209
  • 285
  • Slight improvement: `df %>% gather(Order_Points, Coords, Origin, Dest) %>% mutate(Order_Points = ifelse(Order_Points == "Origin", 1, 2)) %>% select(Carrier, Flight_Num, Coords, Sched_Depr, Order_Points) %>% arrange(desc(Sched_Depr))` – Samuel Oct 29 '17 at 02:28
0

You can also use base R:

 dat <- data.frame(Carrier, Flight_Num, Origin, Dest, Sched_Depr)
  df=reshape(dat,idvar = "Carrier",varying = list(3:4),direction = "long")
 `row.names<-`(df[order(df[,1]),],NULL)
       Carrier Flight_Num Sched_Depr time Origin
 1  ExpressJet       1223       1253    1    BOS
 2  ExpressJet       1223       1253    2    CWA
 3     JetBlue        334       1845    1    SFO
 4     JetBlue        334       1845    2    DSM
 5        Mesa       7124       1955    1    ORD
 6        Mesa       7124       1955    2    PIT
 7     SkyWest       6380       1755    1    BDL
 8     SkyWest       6380       1755    2    CMH
 9      United       7177       1900    1    EWR
 10     United       7177       1900    2    BOI

You can change the variable name of time to the one you have like in your example above

Onyambu
  • 67,392
  • 3
  • 24
  • 53