data frame columns: how can I use loops in this case?

Question

I have a data frame with 2 columns: person and points. In my actual dataset there are more than 1000 persons.

My goal: I need to find persons that have more than 126 points.

df1:

person      points
abc
abc        1
abc
abc        2
abc1    
abc1       1
abc1

I have used this code:

df1 <- read.csv("df1.csv")
  points_to_numeric <- as.numeric(df1$points)

  person_filtered <- df1 %>%
  group_by(person) %>%
  dplyr::filter(sum(points_to_numeric, na.rm = T)>126)%>%
  distinct(person) %>%
  pull()

person_filtered

When I enter this code, as a result I get 800 unique persons. But if I want to know how many persons have less than 126 points - I also get 800 unique persons. So it looks like that it does not work.

This is not reproducible if there is no data for us to use. use `dput(head(df1))` do generate a small test dataset — emilliman5, Aug 21 '20 at 13:51
your code doesn't work because `points_to_numeric` is not grouped because it is not within the `df1` object, it should be `df1$points <- as.numeric(df1$points)` and then `...filter(sum(points...` — emilliman5, Aug 21 '20 at 13:55

John · Answer 1 · 2020-08-21T12:23:08.080

2

Tidyverse solution. Returns a vector with the persons with more than 126 points.

library(tidyverse)

person_filtred <- df1 %>%
  group_by(person) %>%
  dplyr::filter(sum(points, na.rm = T)>126) %>%
  distinct(person) %>%
  pull()

edited Aug 21 '20 at 12:23

answered Aug 21 '20 at 12:02

John

131
5

I got this error: Error: Problem with `filter()` input `..1`. x invalid 'type' (character) of argument i Input `..1` is `sum(points, na.rm = T) > 126`. i The error occurred in group 1: person = " – San Aug 21 '20 at 12:17
1

Try dplyr::filter - there is a stats::filter that masks the dplyr function. I'll update the answer to reflect. – John Aug 21 '20 at 12:23

score 0 · Accepted Answer · answered Aug 21 '20 at 12:29

0

Use of summarise is more idiomatic for this use case.

library(tidyverse)

person_filtred <- df1 %>%
  group_by(person) %>%
  summarise(totalPoints=sum(points, na.rm=TRUE)) %>%
  filter(totalPoints >= 126)

answered Aug 21 '20 at 12:29

emilliman5

5,816
3
27
37

Unfortunately, I got an error again. Error: Problem with `summarise()` input `totalpoints`. x invalid 'type' (character) of argument i Input `totalpoints` is `sum(points, na.rm = T)`. i The error occurred in group 1: person = " – San Aug 21 '20 at 12:36
looks like you need to coerce `points` to numeric with `as.numeric(points)` – emilliman5 Aug 21 '20 at 12:37
I have done this already, When I use class(df1$points), I got an answer that points is numeric. – San Aug 21 '20 at 12:44
Please update your question with a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – emilliman5 Aug 21 '20 at 12:48

score 0 · Answer 3 · answered Aug 21 '20 at 12:33

0

Maybe you can try the code below

subset(aggregate(.~person,df1,sum), points > 126)

or

subset(df1,ave(points,persion,FUN = sum)>126)

answered Aug 21 '20 at 12:33

ThomasIsCoding

96,636
9
24
81

data frame columns: how can I use loops in this case?

3 Answers3