1

I've been using Excel but want to transition over to R but am getting overwhelmed. In excel I use the countifs function a lot to count the instantizes that meet multiple conditions. I have a large CSV with each row containing information about properties (State, pasture area, soy area, corn area, etc.) I want to produce a new table of the counts of properties by state with different conditions. Below is an example of the data I have:

Prop_ID State Pasture Soy Corn
1 WI 20 45 75
2 MN 10 80 122
3 MN 152 0 15
4 IL 0 10 99
5 IL 75 38 0
6 WI 30 45 0
7 WI 68 55 0

I'm looking to produce a new table like this:

State Cnt by state Cnt >25 Pasture Cnt >25 Soy Cnt>25 Pasture AND Soy
WI 3 2 3 2
MN 2 1 1 0
IL 2 1 1 1

Which is the count per state of properties that have >25 pasture, >25 soy, >25 pasture and soy etc. I'm trying to use dplyr but maybe there is another package that's better for this? So on a side note is there a good class or site that can help me learn the basics of reorganizing and pulling simple statistics out of large tables like I am trying to do here?

r2evans
  • 141,215
  • 6
  • 77
  • 149
SpookyDLX
  • 79
  • 5
  • [R for Data Science](https://r4ds.had.co.nz/) is a good book to get started on reorganizing, analyzing data and is free online – Kodiakflds Feb 16 '22 at 19:11
  • `dat %>% group_by(State) %>% summarize(\`Cnt by state\`=n(), \`Cnt >25 Pasture\`=sum(Pasture>25), \`Cnt >25 Soy\`=sum(Soy>25), \`Cnt >25 Pasture AND Soy\` = sum(Pasture>25 & Soy>25))` – r2evans Feb 16 '22 at 19:13
  • 1
    (I think that's the first time I've seen three identical answers in the same minute!) – Jon Spring Feb 16 '22 at 19:14
  • @JonSpring Thanks for your reply! I have more more question if you don't mind. What would I add to this code if I wanted another column that, for example, was the sum of pasture but only for properties with >25 pasture, so the result for WI would be 98. I was playing around with the code and see if you removed the conditions is actually sums the values by each state but with the conditions added it returns the count. – SpookyDLX Feb 16 '22 at 22:35
  • I think `dat %>% group_by(State) %>% summarize(new_thing = sum(Pasture * Pasture >25))`? – Jon Spring Feb 16 '22 at 23:06

1 Answers1

0
df %>%
  group_by(State) %>%
  summarize(Cnt_by_state = n(),
            Cnt_gt25_Past = sum(Pasture > 25),
            Cnt_gt25_Soy = sum(Soy > 25),
            Cnt_gt25_Past_Soy = sum(Pasture > 25 & Soy >25))

Result is by default sorted by alphabetical group…

# A tibble: 3 x 5
  State Cnt_by_state Cnt_gt25_Past Cnt_gt25_Soy Cnt_gt25_Past_Soy
  <chr>        <int>         <int>        <int>             <int>
1 IL               2             1            1                 1
2 MN               2             1            1                 0
3 WI               3             2            3                 2
Jon Spring
  • 55,165
  • 4
  • 35
  • 53