0

I want to create a subset of my data conditioning on two columns at the same time.

Similar to here: subsetting data using multiple variables in R

For example:

Say I have this data set called Gamedat:

        Games    People Hoursplayed
    goldeneye   Michael           5
    goldeneye  Thatcher           8
    goldeneye    Dexter          12
    goldeneye    Dexter          15
       pacman    Dexter           2
       tetris     Clint           5
       tetris    Dexter           8
    goldeneye  Thatcher          12
       pacman  Thatcher          15
    goldeneye     Clint           2
       pacman   Michael           5
       pacman   Michael           8
       pacman     Clint          12
       tetris      John          15
       tetris     Clint           2
 ageofempires     Clint           5
       pacman    Dexter           8
 ageofempires  Thatcher          12
 ageofempires      John          15
    goldeneye    Dexter           2

Say I want to look at a game like goldeneye. And I want to look at how often any player has played other games for the same amount of hours as they've played goldeneye (this is a lot more useful in my real data set).

So I do this:

 Gameofinterest <- Gamedat[ grep("goldeneye", Gamedat[ ,1]), ]`

Then I do this:

  subset(Gamedat, Gamedat[ ,2] %in% Gameofinterest[ ,2] & 
  Gamedat[ ,3] %in% Gameofinterest[ ,3])

But this gives me:

       Games   People Hoursplayed
   goldeneye  Michael           5
   goldeneye Thatcher           8
   goldeneye   Dexter          12
   goldeneye   Dexter          15
      pacman   Dexter           2
      tetris    Clint           5
      tetris   Dexter           8
   goldeneye Thatcher          12
      pacman Thatcher          15
   goldeneye    Clint           2
      pacman  Michael           5
      pacman  Michael           8
      pacman    Clint          12
      tetris    Clint           2
ageofempires    Clint           5
      pacman   Dexter           8
ageofempires Thatcher          12
   goldeneye   Dexter           2

When what I really want is this:

         Games   People Hoursplayed
     goldeneye  Michael           5
     goldeneye Thatcher           8
     goldeneye   Dexter          12
     goldeneye   Dexter          15
        pacman   Dexter           2
     goldeneye Thatcher          12
     goldeneye    Clint           2
        pacman  Michael           5
        tetris    Clint           2
  ageofempires Thatcher          12
     goldeneye   Dexter           2

In short, I want to find examples that match "People & Hoursplayed",

instead of "People" & "Hoursplayed"... make sense?

I know I can do this:

 Gamedat$PHpaste <- paste(Gamedat$People, Gamedat$Hoursplayed, sep="")

 Gamedat[Gamedat[ ,4] %in% Gameofinterest[ ,4], ]

and get:

        Games   People Hoursplayed    PHpaste
    goldeneye  Michael           5   Michael5
    goldeneye Thatcher           8  Thatcher8
    goldeneye   Dexter          12   Dexter12
    goldeneye   Dexter          15   Dexter15
       pacman   Dexter           2    Dexter2
    goldeneye Thatcher          12 Thatcher12
    goldeneye    Clint           2     Clint2
       pacman  Michael           5   Michael5
       tetris    Clint           2     Clint2
 ageofempires Thatcher          12 Thatcher12
    goldeneye   Dexter           2    Dexter2

Was hoping for something more elegant?

Community
  • 1
  • 1
DryLabRebel
  • 8,923
  • 3
  • 18
  • 24
  • Is your desired result correct? Dexter has played pacman for 2 hours, but played goldeneye for 29 hours... Is it because 2 of those 29 hours are part of a unique record? – shayaa Feb 28 '17 at 00:26
  • The last row shows Dexter has played goldeneye for 2 hours, so it is a correct match. – DryLabRebel Feb 28 '17 at 01:12

1 Answers1

0

I think this can be achieved using dplyr. First, use filter to retrieve rows where the game is goldeneye. Then use inner_join to join with the original data using People and HoursPlayed. Optionally: select the desired columns and arrange by People.

library(dplyr)
Gamedat %>% 
  filter(Games == "goldeneye") %>% 
  inner_join(Gamedat, by = c("People", "Hoursplayed")) %>% 
  select(Games = Games.y, People, Hoursplayed) %>% 
  arrange(People)

Result:

          Games   People Hoursplayed
1     goldeneye    Clint           2
2        tetris    Clint           2
3     goldeneye   Dexter          12
4     goldeneye   Dexter          15
5        pacman   Dexter           2
6     goldeneye   Dexter           2
7     goldeneye  Michael           5
8        pacman  Michael           5
9     goldeneye Thatcher           8
10    goldeneye Thatcher          12
11 ageofempires Thatcher          12
neilfws
  • 32,751
  • 5
  • 50
  • 63