Using R to count number of observations by group and condition

Question

I've been using Excel but want to transition over to R but am getting overwhelmed. In excel I use the countifs function a lot to count the instantizes that meet multiple conditions. I have a large CSV with each row containing information about properties (State, pasture area, soy area, corn area, etc.) I want to produce a new table of the counts of properties by state with different conditions. Below is an example of the data I have:

Prop_ID	State	Pasture	Soy	Corn
1	WI	20	45	75
2	MN	10	80	122
3	MN	152	0	15
4	IL	0	10	99
5	IL	75	38	0
6	WI	30	45	0
7	WI	68	55	0

I'm looking to produce a new table like this:

State	Cnt by state	Cnt >25 Pasture	Cnt >25 Soy	Cnt>25 Pasture AND Soy
WI	3	2	3	2
MN	2	1	1	0
IL	2	1	1	1

Which is the count per state of properties that have >25 pasture, >25 soy, >25 pasture and soy etc. I'm trying to use dplyr but maybe there is another package that's better for this? So on a side note is there a good class or site that can help me learn the basics of reorganizing and pulling simple statistics out of large tables like I am trying to do here?

[R for Data Science](https://r4ds.had.co.nz/) is a good book to get started on reorganizing, analyzing data and is free online — Kodiakflds, Feb 16 '22 at 19:11
`dat %>% group_by(State) %>% summarize(\`Cnt by state\`=n(), \`Cnt >25 Pasture\`=sum(Pasture>25), \`Cnt >25 Soy\`=sum(Soy>25), \`Cnt >25 Pasture AND Soy\` = sum(Pasture>25 & Soy>25))` — r2evans, Feb 16 '22 at 19:13
(I think that's the first time I've seen three identical answers in the same minute!) — Jon Spring, Feb 16 '22 at 19:14
@JonSpring Thanks for your reply! I have more more question if you don't mind. What would I add to this code if I wanted another column that, for example, was the sum of pasture but only for properties with >25 pasture, so the result for WI would be 98. I was playing around with the code and see if you removed the conditions is actually sums the values by each state but with the conditions added it returns the count. — SpookyDLX, Feb 16 '22 at 22:35
I think `dat %>% group_by(State) %>% summarize(new_thing = sum(Pasture * Pasture >25))`? — Jon Spring, Feb 16 '22 at 23:06

score 0 · Answer 1 · answered Feb 16 '22 at 19:13

df %>%
  group_by(State) %>%
  summarize(Cnt_by_state = n(),
            Cnt_gt25_Past = sum(Pasture > 25),
            Cnt_gt25_Soy = sum(Soy > 25),
            Cnt_gt25_Past_Soy = sum(Pasture > 25 & Soy >25))

Result is by default sorted by alphabetical group…

# A tibble: 3 x 5
  State Cnt_by_state Cnt_gt25_Past Cnt_gt25_Soy Cnt_gt25_Past_Soy
  <chr>        <int>         <int>        <int>             <int>
1 IL               2             1            1                 1
2 MN               2             1            1                 0
3 WI               3             2            3                 2

Prop_ID	State	Pasture	Soy	Corn
1	WI	20	45	75
2	MN	10	80	122
3	MN	152	0	15
4	IL	0	10	99
5	IL	75	38	0
6	WI	30	45	0
7	WI	68	55	0

Prop_ID	State	Pasture	Soy	Corn
1	WI	20	45	75
2	MN	10	80	122
3	MN	152	0	15
4	IL	0	10	99
5	IL	75	38	0
6	WI	30	45	0
7	WI	68	55	0

Using R to count number of observations by group and condition

1 Answers1

Prop_ID	State	Pasture	Soy	Corn
1	WI	20	45	75
2	MN	10	80	122
3	MN	152	0	15
4	IL	0	10	99
5	IL	75	38	0
6	WI	30	45	0
7	WI	68	55	0