0

My data is big but I am taking example of mtcars database in R. What I want exactly is to extract "Cyl" column values corresponding to df(a data frame I have created which have some values from column "mpg"). I want to extract values of column Cyl corresponding to the values of mpg stored in a data frame df.

> dput(mtcars)
structure(list(mpg = c(21, 21, 22.8, 21.4, 18.7, 18.1, 14.3, 
24.4, 22.8, 19.2, 17.8, 16.4, 17.3, 15.2, 10.4, 10.4, 14.7, 32.4, 
30.4, 33.9, 21.5, 15.5, 15.2, 13.3, 19.2, 27.3, 26, 30.4, 15.8, 
19.7, 15, 21.4), cyl = c(6, 6, 4, 6, 8, 6, 8, 4, 4, 6, 6, 8, 
8, 8, 8, 8, 8, 4, 4, 4, 4, 8, 8, 8, 8, 4, 4, 4, 8, 6, 8, 4), 
    disp = c(160, 160, 108, 258, 360, 225, 360, 146.7, 140.8, 
    167.6, 167.6, 275.8, 275.8, 275.8, 472, 460, 440, 78.7, 75.7, 
    71.1, 120.1, 318, 304, 350, 400, 79, 120.3, 95.1, 351, 145, 
    301, 121), hp = c(110, 110, 93, 110, 175, 105, 245, 62, 95, 
    123, 123, 180, 180, 180, 205, 215, 230, 66, 52, 65, 97, 150, 
    150, 245, 175, 66, 91, 113, 264, 175, 335, 109), drat = c(3.9, 
    3.9, 3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.92, 3.92, 3.92, 
    3.07, 3.07, 3.07, 2.93, 3, 3.23, 4.08, 4.93, 4.22, 3.7, 2.76, 
    3.15, 3.73, 3.08, 4.08, 4.43, 3.77, 4.22, 3.62, 3.54, 4.11
    ), wt = c(2.62, 2.875, 2.32, 3.215, 3.44, 3.46, 3.57, 3.19, 
    3.15, 3.44, 3.44, 4.07, 3.73, 3.78, 5.25, 5.424, 5.345, 2.2, 
    1.615, 1.835, 2.465, 3.52, 3.435, 3.84, 3.845, 1.935, 2.14, 
    1.513, 3.17, 2.77, 3.57, 2.78), qsec = c(16.46, 17.02, 18.61, 
    19.44, 17.02, 20.22, 15.84, 20, 22.9, 18.3, 18.9, 17.4, 17.6, 
    18, 17.98, 17.82, 17.42, 19.47, 18.52, 19.9, 20.01, 16.87, 
    17.3, 15.41, 17.05, 18.9, 16.7, 16.9, 14.5, 15.5, 14.6, 18.6
    ), vs = c(0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 
    0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1), am = c(1, 
    1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 
    0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1), gear = c(4, 4, 4, 3, 
    3, 3, 3, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 4, 4, 4, 3, 3, 3, 
    3, 3, 4, 5, 5, 5, 5, 5, 4), carb = c(4, 4, 1, 1, 2, 1, 4, 
    2, 2, 4, 4, 3, 3, 3, 4, 4, 4, 1, 2, 1, 1, 2, 2, 4, 2, 1, 
    2, 2, 4, 6, 8, 2)), row.names = c("Mazda RX4", "Mazda RX4 Wag", 
"Datsun 710", "Hornet 4 Drive", "Hornet Sportabout", "Valiant", 
"Duster 360", "Merc 240D", "Merc 230", "Merc 280", "Merc 280C", 
"Merc 450SE", "Merc 450SL", "Merc 450SLC", "Cadillac Fleetwood", 
"Lincoln Continental", "Chrysler Imperial", "Fiat 128", "Honda Civic", 
"Toyota Corolla", "Toyota Corona", "Dodge Challenger", "AMC Javelin", 
"Camaro Z28", "Pontiac Firebird", "Fiat X1-9", "Porsche 914-2", 
"Lotus Europa", "Ford Pantera L", "Ferrari Dino", "Maserati Bora", 
"Volvo 142E"), class = "data.frame")
dput(df)
structure(list(vals = c(21, 22.8, 15.2, 19.2, 17.8, 13.3, 15.5, 
30.4, 10.4)), class = "data.frame", row.names = c(NA, -9L))
#I tried this 
mtcars22 %>% filter(cyl,mpg==df)
DD11
  • 75
  • 8
  • 2
    Equality of numeric values is going to cause a problem at some point, you need to think about tolerance of value differences. That is, when floating point is concerned, `==` might work a good amount of time but will fail without telling you. – r2evans Jul 06 '20 at 04:26
  • You could use `mtcars %>% semi_join(df,by=c("mpg"="vals"))` – MrFlick Jul 06 '20 at 04:27
  • It's precisely that operation that will work some or even most of the time, but when it fails it will be completely silent unless you audit every row of your output for expected values. @MrFlick. – r2evans Jul 06 '20 at 04:29
  • 1
    A very good point @r2evans. Matching on decimals values is very dangerous indeed. – MrFlick Jul 06 '20 at 04:30

1 Answers1

2

You can use :

mtcars22$cyl[mtcars22$mpg %in% df$vals]
#[1] 6 6 4 4 6 6 8 8 8 4 8 8 8 8 4

Or

subset(mtcars22, mpg %in% df$vals, select = cyl)
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • 1
    Equality of floating point is the wrong approach, it should not be a recommendation. This includes the `%in%` operator. – r2evans Jul 06 '20 at 04:31
  • my df contains date actually that shows me output "data frame with 0 columns and 3574 rows" – DD11 Jul 06 '20 at 04:33
  • @DD11 Which code are you using? Did you try `mtcars22$cyl[mtcars22$mpg %in% df$vals]` ? It should return a vector and not a dataframe. – Ronak Shah Jul 06 '20 at 05:20
  • @r2evans True. OP seem to have dates in their actual data. – Ronak Shah Jul 06 '20 at 05:26
  • 1
    @DD11 Were you able to figure this out? Did you get the output that you were looking for? – Ronak Shah Jul 07 '20 at 01:53
  • @RonakShah Yes, my two dates column had class other than date so I changed them and it worked thanks. – DD11 Jul 07 '20 at 01:58