0

I have a data frame with 2 columns: person and points. In my actual dataset there are more than 1000 persons.

My goal: I need to find persons that have more than 126 points.

df1:

person      points
abc
abc        1
abc
abc        2
abc1    
abc1       1
abc1

I have used this code:

df1 <- read.csv("df1.csv")
  points_to_numeric <- as.numeric(df1$points)

  person_filtered <- df1 %>%
  group_by(person) %>%
  dplyr::filter(sum(points_to_numeric, na.rm = T)>126)%>%
  distinct(person) %>%
  pull()

person_filtered

When I enter this code, as a result I get 800 unique persons. But if I want to know how many persons have less than 126 points - I also get 800 unique persons. So it looks like that it does not work.

San
  • 183
  • 8
  • This is not reproducible if there is no data for us to use. use `dput(head(df1))` do generate a small test dataset – emilliman5 Aug 21 '20 at 13:51
  • your code doesn't work because `points_to_numeric` is not grouped because it is not within the `df1` object, it should be `df1$points <- as.numeric(df1$points)` and then `...filter(sum(points...` – emilliman5 Aug 21 '20 at 13:55

3 Answers3

2

Tidyverse solution. Returns a vector with the persons with more than 126 points.

library(tidyverse)

person_filtred <- df1 %>%
  group_by(person) %>%
  dplyr::filter(sum(points, na.rm = T)>126) %>%
  distinct(person) %>%
  pull()
John
  • 131
  • 5
  • I got this error: Error: Problem with `filter()` input `..1`. x invalid 'type' (character) of argument i Input `..1` is `sum(points, na.rm = T) > 126`. i The error occurred in group 1: person = " – San Aug 21 '20 at 12:17
  • 1
    Try dplyr::filter - there is a stats::filter that masks the dplyr function. I'll update the answer to reflect. – John Aug 21 '20 at 12:23
0

Use of summarise is more idiomatic for this use case.

library(tidyverse)

person_filtred <- df1 %>%
  group_by(person) %>%
  summarise(totalPoints=sum(points, na.rm=TRUE)) %>%
  filter(totalPoints >= 126)
emilliman5
  • 5,816
  • 3
  • 27
  • 37
  • Unfortunately, I got an error again. Error: Problem with `summarise()` input `totalpoints`. x invalid 'type' (character) of argument i Input `totalpoints` is `sum(points, na.rm = T)`. i The error occurred in group 1: person = " – San Aug 21 '20 at 12:36
  • looks like you need to coerce `points` to numeric with `as.numeric(points)` – emilliman5 Aug 21 '20 at 12:37
  • I have done this already, When I use class(df1$points), I got an answer that points is numeric. – San Aug 21 '20 at 12:44
  • Please update your question with a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – emilliman5 Aug 21 '20 at 12:48
0

Maybe you can try the code below

subset(aggregate(.~person,df1,sum), points > 126)

or

subset(df1,ave(points,persion,FUN = sum)>126)
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81