2

I have the following sample dataset (below and/or as CSVs here: http://goo.gl/wK57T) which I want to transform as follows. For each person in a household I want to create two new variables OrigTAZ and DestTAZ. It should take the value in TripendTAZ and put that in DestTAZ. For OrigTAZ it should put value of TripendTAZ from the previous row. For the first trip of every person in a household (Tripid = 1) the OrigTAZ = hometaz. For each person in a household, from the second trip OrigTAZ = TripendTAZ_(n-1) and DestTAZ = TripEndTAZ. The sample input and output data are shown below. I tried the suggestions shown here: Basic lag in R vector/dataframe but have not had luck. I am used to doing something like this in SAS.

Any help is appreciated.

TIA, Krishnan

SAS Code Sample
     if Houseid = lag(Houseid) then do;
       if Personid = lag(Personid) then do;
          DestTAZ = TripendTAZ;
       if Tripid = 1 then OrigTAZ = hometaz 
       else
         OrigTAZ = lag(TripendTAZ);
       end;
     end;

INPUT DATA

Houseid,Personid,Tripid,hometaz,TripendTAZ
1,1,1,45,4
1,1,2,45,7
1,1,3,45,87
1,1,4,45,34
1,1,5,45,45
2,1,1,8,96
2,1,2,8,4
2,1,3,8,2
2,1,4,8,1
2,1,5,8,8
2,2,1,8,58
2,2,2,8,67
2,2,3,8,9
2,2,4,8,10
2,2,5,8,8
3,1,1,7,89
3,1,2,7,35
3,1,3,7,32
3,1,4,7,56
3,1,5,7,7

OUTPUT DATA

Houseid,Personid,Tripid,hometaz,TripendTAZ,OrigTAZ,DestTAZ
1,1,1,45,4,45,4
1,1,2,45,7,4,7
1,1,3,45,87,7,87
1,1,4,45,34,87,34
1,1,5,45,45,34,45
2,1,1,8,96,8,96
2,1,2,8,4,96,4
2,1,3,8,2,4,2
2,1,4,8,1,2,1
2,1,5,8,8,1,8
2,2,1,8,58,8,58
2,2,2,8,67,58,67
2,2,3,8,9,67,9
2,2,4,8,10,9,10
2,2,5,8,8,10,8
3,1,1,7,89,7,89
3,1,2,7,35,89,35
3,1,3,7,32,35,32
3,1,4,7,56,32,56
3,1,5,7,7,56,7
Community
  • 1
  • 1
Krishnan
  • 1,265
  • 2
  • 13
  • 24

1 Answers1

3

Just proceed through the steps you outlined step-by-step and it isn't so bad.

First I'll read in your data by copying it:

df <- read.csv(file('clipboard'))

Then I'll sort to make sure the data frame is ordered by houseid, then personid, then tripid:

# first sort so that it's ordered by Houseid, then Personid, then Tripid:
df <- with(df, df[order(Houseid,Personid,Tripid),])

Then follow the steps you specified:

# take value in TripendTAZ and put it in DestTAZ
df$DestTAZ <- df$TripendTAZ

# Set OrigTAZ = value from previous row
df$OrigTAZ <- c(NA,df$TripendTAZ[-nrow(df)])

# For the first trip of every person in a household (Tripid = 1),
#  OrigTAZ = hometaz. 
df$OrigTAZ[ df$Tripid==1 ] <- df$hometaz[ df$Tripid==1 ]

You'll notice that df is then what you're after.

mathematical.coffee
  • 55,977
  • 11
  • 154
  • 194