0

I searched many posts like this Subsetting a dataframe in R by multiple conditions about selecting the "which" function to select values ​​dataframe dataframe and can not find another solution . The problem is as follows:

I have the following data set with thousands of cases:

> head(Datos)
   tipo         estacion     hora  usos
 1 hábil         A.SANIN      X4    11
 2 hábil        ALAMOS        X4     4
 3 hábil        AMANECER      X4    45
 4 hábil       AMERICAS       X4     2
 5 hábil       ATANASIO       X4    10
 6 hábil       BELALCAZAR     X4     5
 .    .                       .     .
 .    .                       .     .
 .    .                       .     .

The variable to subset of dataframe above is "usos" The variable "tipo" takes the values ​​: "hábil", "Sábado" and "Festivo". The variable "estacion" has 60 levels and the variable "hora" has 22 values: x4, x5, x6, ... x23 . As I need to calculate the quartiles according to all combinations of "tipo" , "estacion" and "hora" i use the "aggregate" function and calculate the critical values ​​so I get this:

  > head(todo)
    Group.1  Group.2      Group.3      y1    y2
 1   hábil      X4         A.SANIN      1.5   21.5
 2   Sábado     X4         A.SANIN      4.0   12.0
 3   Festivo    X4         A.SANIN      0.0   0.0
 4   hábil      X5         A.SANIN      66.0  130.0
 5   Sábado     X5         A.SANIN      40.0  96.0
 6   Festivo    X5         A.SANIN      7.5   43.5
 .                      .
 .                      .
 .                      .

Each row is a different case and the values ​​y1 and y2 are my critical values​​. Need, according to the values ​​y1 and y2 of the dataframe "todo" that I choose to values ​​less than y1 or greater y2 of the variable "usos" from dataframe"Datos". But in a cycle, there are 3480 combinations on dataframe "todo", this is, 3480 rows. And store it in another Matrix.

For example, for the first case is as follows:

print(which(subset(Datos$usos,Datos$tipo=="hábil"&Datos$hora=="X4"&Datos$estacion=="A.SANIN")<todo$y1[1] | subset(Datos$usos,Datos$tipo=="hábil"&Datos$hora=="X4"&Datos$estacion=="A.SANIN")>todo$y2[1]))

I need to do that for all rows of the dataframe "all" and apply it to "use" the dataframe "Data".

THANK YOU!

Community
  • 1
  • 1
JULIAN
  • 47
  • 1
  • 2
  • 7

1 Answers1

0

I had a little trouble understanding what you are saying, but I think this is what you want. First, merge todo with Datos:

# Rename the columns of todo to match Datos
names(todo)<-c('tipo','hora','estacion','y1','y2')
# Merge the two.
Datos.y.todo<-merge(Datos,todo)

Now, you can easily subset based on your criteria:

with(Datos.y.todo, Datos.y.todo[usos<y1 | usos>y2, ])

I believe the above answers your question. Let me illustrate with your data.

# Load in your data.
Datos<-read.table(textConnection('tipo         estacion     hora  usos
1 hábil         A.SANIN      X4    11
2 hábil        ALAMOS        X4     4
3 hábil        AMANECER      X4    45
4 hábil       AMERICAS       X4     2
5 hábil       ATANASIO       X4    10
6 hábil       BELALCAZAR     X4     5'),header=TRUE)

todo<-read.table(textConnection('Group.1  Group.2      Group.3      y1    y2
1   hábil      X4         A.SANIN      1.5   21.5
2   Sábado     X4         A.SANIN      4.0   12.0
3   Festivo    X4         A.SANIN      0.0   0.0
4   hábil      X5         A.SANIN      66.0  130.0
5   Sábado     X5         A.SANIN      40.0  96.0
6   Festivo    X5         A.SANIN      7.5   43.5'),header=TRUE)

As you mentioned, Datos contains many of the same "cases" repeatedly, so let's add two rows to Datos to make the example clearer:

Datos<-rbind(Datos,data.frame(tipo=c('hábil','Sábado'),estacion='A.SANIN',hora='X4',usos=c('23','3')))
#     tipo   estacion hora usos
# 1  hábil    A.SANIN   X4   11 # This one neither below y1 or above y2
# 2  hábil     ALAMOS   X4    4
# 3  hábil   AMANECER   X4   45
# 4  hábil   AMERICAS   X4    2
# 5  hábil   ATANASIO   X4   10
# 6  hábil BELALCAZAR   X4    5
# 7  hábil    A.SANIN   X4   23 # This one is above y2 (21.5), so we want to find it.
# 8 Sábado    A.SANIN   X4    3 # This one is below y1 (4.0), so we want to find it. 

Now run the code I previously gave you:

# Rename the columns of todo to match Datos
names(todo)<-c('tipo','hora','estacion','y1','y2')
# Merge the two.
Datos.y.todo<-merge(Datos,todo)
# Notice how the y1 and y2 values are now repeated for easy comparison.
#     tipo estacion hora usos  y1   y2
# 1  hábil  A.SANIN   X4   11 1.5 21.5 # We don't want this row.
# 2  hábil  A.SANIN   X4   23 1.5 21.5 # We want this row.
# 3 Sábado  A.SANIN   X4    3 4.0 12.0 # We want this row.

And finally, you filter the rows you want:

with(Datos.y.todo, Datos.y.todo[usos<y1 | usos>y2, ])

#     tipo estacion hora usos  y1   y2
# 2  hábil  A.SANIN   X4   23 1.5 21.5
# 3 Sábado  A.SANIN   X4    3 4.0 12.0
nograpes
  • 18,623
  • 1
  • 44
  • 67
  • Not really my friend. I think i need to explain you more. The think is, the dataframe "todo" has 3480 rows. Each row is a case, for example the first case is (hábil,X4,A.SANIN). But this cases, are in the dataframe "Datos", many times, for example for the first y1 and y2 of "todo" is 1.5 and 21.5. So i search the case (hábil,X4,A.SANIN) in "Datos" and there are many rows. see? so i look for each row of "Datos" how is "usos", and if it is less than y1 or greater than y2, report the row, with the function which. – JULIAN Feb 21 '14 at 15:15
  • Have you tried the code on your data? I have updated my answer to make it illustrate how each step works. Can you explain a specific case in which my code does not work on your data? – nograpes Feb 22 '14 at 20:10
  • By the way, is this transit data for Cali, Colombia? Is it publicly available somewhere? – nograpes Feb 22 '14 at 20:27