0

I'm working on an unbalanced panel dataset. Data came from a game and for every user (user_id) in the record I have data for every level (level) of the game. As recording data started some time after introduction of the game, for some users I don't have data regarding the first levels, that's why I want to throw them out in a first step.

I've tried the complete.cases-function, but it only excludes the rows with the missing values (NAs), but not data for the whole user with missing values in level 1.

panel <- panel[complete.cases(panel), ] 

That's why I need a code that excludes every user who has no record in level 1 (which in my dataset means he has an "NA" at one of the dependent variables, i.e. number of activities).

Update #1: Data looks like this (thanks to thc):

> game_data <- data.frame(player = c(1,1,1,2,2,2,3,3,3), level = c(1,2,3,1,2,3,1,2,3), score=c(0,150,170,80,100,110,75,100,0))
> game_data
  player level score
1      1     1     0
2      1     2   150
3      1     3   170
4      2     1    80
5      2     2   100
6      2     3   110
7      3     1    75
8      3     2   100
9      3     3     0

I now want to exclude data from player 1, because he has a score of 0 in level 1.

Jensen
  • 1
  • 2
  • 2
    When asking for help, you should include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. This will make it easier to help you. – MrFlick Oct 17 '18 at 15:18
  • Thank you, I will of course do that in the future. – Jensen Oct 18 '18 at 08:03

2 Answers2

0

Here is one approach

Example data:

game_data <- data.frame(player = c(1,1,2,2,2,3,3,3), level = c(2,3,1,2,3,1,2,3), score=sample(100, 8))
> game_data
  player level score
1      1     2    19
2      1     3    13
3      2     1    65
4      2     2    32
5      2     3    22
6      3     1    98
7      3     2    58
8      3     3    84


library(dplyr)
game_data %>% group_by(player) %>% filter(any(level == 1)) %>% as.data.frame
  player level score
1      2     1    65
2      2     2    32
3      2     3    22
4      3     1    98
5      3     2    58
6      3     3    84
thc
  • 9,527
  • 1
  • 24
  • 39
  • Hi thc, thanks for your help! It's more like this: > game_data <- data.frame(player = c(1,1,1,2,2,2,3,3,3), level = c(1,2,3,1,2,3,1,2,3), score=c(0,150,170,80,100,110,75,100,0)) > game_data player level score 1 1 1 0 2 1 2 150 3 1 3 170 4 2 1 80 5 2 2 100 6 2 3 110 7 3 1 75 8 3 2 100 9 3 3 0 I now want to throw out player 1, because he has no entry in level 1, but not player 3, who only stopped using the game at level 2. I now want to throw out – Jensen Oct 18 '18 at 07:57
0

I think I now find a solution with your help:

game_data %>% group_by(player) %>% filter(any(level == 1 & score > 0)) %>% as.data.frame

This seems to work and I just needed a little adjustment from your code thc, thank you very much for your help!

Jensen
  • 1
  • 2