3

One includes the TMC # for all the roadways of interest. The second includes travel times occurring on every single roadway in a particular state. I want to use the first table to filter so that only the records corresponding to those roadways of interest are remaining.

df

  id     link       tmc
1  1 23402444 122P06466
2  2 23402487 122P06476
3  3 23402488 122N06476
4  4 23402493 122N06477
5  5 23402555 122P06454
6  6 23402557 122N06453

df2

  id       tmc   epoch  tt
1  1 108N04625 1182014 163
2  2 108N04625 1182014 103
3  3 108N04625 1182014  73
4  4 108N04625 1172014 254
5  5 108N04625 1172014 224

I was trying to use a filter

Data2Filter<-(Data2, TMC==Data2$TMC)

but I was getting either object not found for everything or that my dimensions are mismatched (there are about 8000 records in Data1 and 14000000 in Data 2 because there can be multiple travel times (TT) on a TMC but I'm only interested in those that occurred on TMCs corresponding to the list in Data1). I'm very familiar with MatLab but unfortunately at the moment I only have R Studio available and know nothing about this software. Also these table are loaded in as cvs files if that makes any difference.

Paulo E. Cardoso
  • 5,778
  • 32
  • 42
mgc77
  • 113
  • 1
  • 9

1 Answers1

3
df2$tmc[1] <- df$tmc[1]
df2

  id       tmc   epoch  tt
1  1 122P06466 1182014 163
2  2 108N04625 1182014 103
3  3 108N04625 1182014  73
4  4 108N04625 1172014 254
5  5 108N04625 1172014 224

Many many options

subset(df2, tmc %in% df$tmc)

df2[df2$tmc %in% df$tmc, ]

library(dplyr)
fi <- filter(df2, tmc %in% df$tmc)

fi
  id       tmc   epoch  tt
1  1 122P06466 1182014 163

superfast with large datasets

library(data.table)
dt <- data.table(df)
dt2 <- data.table(df2)
subset(dt2, tmc %in% dt$tmc)

This may be a useful topic on subset performance

Community
  • 1
  • 1
Paulo E. Cardoso
  • 5,778
  • 32
  • 42