0

I wish to use subqueries like structure in dplyr efficiently. For example we do subqueries like

select * from table where tbl_id in (select tbl_id from table1 where name='a');

Here, I assume we are not saving any results from inner query. I want to use similar structure in dplyr chains.

I have tried to use the result of one dplyr chain in another by putting it in brackets but it doesn't work that way. I already know that we can save it as temporary df and use it but I don't want to save it.

Below are the two table data/dataframes :

# Libraries

library(sqldf)
#> Loading required package: gsubfn
#> Loading required package: proto
#> Loading required package: RSQLite
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union


# Q 16 Write a query in SQL to find the name of those 
# movies where one or more actors acted in two or more movies

movie <- read.csv("q14_movie.csv")
movie_cast <- read.csv("q16_movie_cast.csv")

print(movie)
#>    mov_id                 mov_title mov_year mov_time  mov_lang
#> 1     901                   Vertigo     1958      128   English
#> 2     902             The Innocents     1961      100   English
#> 3     903        Lawrence of Arabia     1962      216   English
#> 4     904           The Deer Hunter     1978      183   English
#> 5     905                   Amadeus     1984      160   English
#> 6     906              Blade Runner     1982      117   English
#> 7     907            Eyes Wide Shut     1999      159   English
#> 8     908        The Usual Suspects     1995      106   English
#> 9     909                 Chinatown     1974      130   English
#> 10    910             Boogie Nights     1997      155   English
#> 11    911                Annie Hall     1977       93   English
#> 12    912         Princess Mononoke     1997      134  Japanese
#> 13    913  The Shawshank Redemption     1994      142   English
#> 14    914           American Beauty     1999      122   English
#> 15    915                   Titanic     1997      194   English
#> 16    916         Good Will Hunting     1997      126   English
#> 17    917               Deliverance     1972      109   English
#> 18    918             Trainspotting     1996       94   English
#> 19    919              The Prestige     2006      130   English
#> 20    920              Donnie Darko     2001      113   English
#> 21    921       Slumdog Millionaire     2008      120   English
#> 22    922                    Aliens     1986      137   English
#> 23    923            Beyond the Sea     2004      118   English
#> 24    924                    Avatar     2009      162   English
#> 25    926             Seven Samurai     1954      207  Japanese
#> 26    927             Spirited Away     2001      125  Japanese
#> 27    928        Back to the Future     1985      116   English
#> 28    925                Braveheart     1995      178   English
#>      mov_dt_rel mov_rel_country
#> 1    1958-08-24              UK
#> 2    1962-02-19              SW
#> 3    1962-12-11              UK
#> 4    1979-03-08              UK
#> 5    1985-01-07              UK
#> 6    1982-09-09              UK
#> 7                            UK
#> 8    1995-08-25              UK
#> 9    1974-08-09              UK
#> 10   1998-02-16              UK
#> 11   1977-04-20             USA
#> 12   2001-10-19              UK
#> 13   1995-02-17              UK
#> 14                           UK
#> 15   1998-01-23              UK
#> 16   1998-06-03              UK
#> 17   1982-10-05              UK
#> 18   1996-02-23              UK
#> 19   2006-11-10              UK
#> 20                           UK
#> 21   2009-01-09              UK
#> 22   1986-08-29              UK
#> 23   2004-11-26              UK
#> 24   2009-12-17              UK
#> 25   1954-04-26              JP
#> 26   2003-09-12              UK
#> 27   1985-12-04              UK
#> 28   1995-09-08              UK

print(movie_cast)
#>    act_id mov_id                   role
#> 1     101    901  John Scottie Ferguson
#> 2     102    902           Miss Giddens
#> 3     103    903          T.E. Lawrence
#> 4     104    904                Michael
#> 5     105    905        Antonio Salieri
#> 6     106    906           Rick Deckard
#> 7     107    907          Alice Harford
#> 8     108    908                McManus
#> 9     110    910            Eddie Adams
#> 10    111    911            Alvy Singer
#> 11    112    912                    San
#> 12    113    913          Andy Dufresne
#> 13    114    914         Lester Burnham
#> 14    115    915    Rose DeWitt Bukater
#> 15    116    916           Sean Maguire
#> 16    117    917                     Ed
#> 17    118    918                 Renton
#> 18    120    920        Elizabeth Darko
#> 19    121    921            Older Jamal
#> 20    122    922                 Ripley
#> 21    114    923            Bobby Darin
#> 22    109    909            J.J. Gittes
#> 23    119    919          Alfred Borden



sqldf('select * from movie m join movie_cast mc on 
m.mov_id=mc.mov_id where mc.act_id in  
      (select act_id from movie_cast group by act_id having count(mov_id)>1)')
#>   mov_id        mov_title mov_year mov_time mov_lang   mov_dt_rel
#> 1    914  American Beauty     1999      122  English             
#> 2    923   Beyond the Sea     2004      118  English   2004-11-26
#>   mov_rel_country act_id mov_id            role
#> 1              UK    114    914  Lester Burnham
#> 2              UK    114    923     Bobby Darin



tmp <- movie_cast %>% 
        group_by(act_id) %>% 
          summarise(num_movies=n_distinct(mov_id)) %>% 
                                filter(num_movies>1) 

inner_join(movie,movie_cast,by='mov_id') %>% filter(act_id %in% tmp$act_id)
#>   mov_id        mov_title mov_year mov_time mov_lang   mov_dt_rel
#> 1    914  American Beauty     1999      122  English             
#> 2    923   Beyond the Sea     2004      118  English   2004-11-26
#>   mov_rel_country act_id            role
#> 1              UK    114  Lester Burnham
#> 2              UK    114     Bobby Darin


inner_join(movie,movie_cast,by='mov_id') %>% 
  filter(act_id %in% (movie_cast %>% 
group_by(act_id) %>% 
summarise(num_movies=n_distinct(mov_id)) %>% 
filter(num_movies>1) %>% 
select(act_id)))
#> [1] mov_id          mov_title       mov_year        mov_time       
#> [5] mov_lang        mov_dt_rel      mov_rel_country act_id         
#> [9] role           
#> <0 rows> (or 0-length row.names) 

I wish to get same results without saving as tmp as explained in code !

Thanks

Deepak
  • 55
  • 6
  • 3
    can you make a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example)? – AidanGawronski Apr 06 '19 at 17:40
  • Hi Aidan, Thanks for your reply. Please follow below github link for R file and data : example_dplyr.R , q14_movie.csv, q16_movie_cast.csv [link] (https://github.com/iamdeepaka/SQL) – Deepak Apr 06 '19 at 18:30
  • 2
    @Deepak: you will receive more help if you make the problem reproducible within the question itself. Not to mention the question won't be useful for others if you delete all the files later. You can use the [`reprex`](https://reprex.tidyverse.org/articles/articles/magic-reprex.html) and [`datapasta`](https://cran.r-project.org/web/packages/datapasta/vignettes/how-to-datapasta.html) packages – Tung Apr 06 '19 at 18:53
  • Hi @Tung thanks for your reply. I have used reprex to generate reproducible results and pasted here. Hope it helps people understand my issue and replicate it. I am still keeping github link in case somebody wants to quickly run and make changes in program. – Deepak Apr 06 '19 at 19:20
  • @Deepak: that's better but you need to update `read_csv` to read directly from GitHub URLs. Even better if you post small samples of `movie_cast` & `movie` data frame only – Tung Apr 06 '19 at 19:59

0 Answers0