How to write R function to find a specific condition in a dataframe

Question

I have a specific problem I'm trying to solve within a dataframe using R. Here are the columns and a few rows of sample data:

I want to write a function that finds ZIP where there is more than one FIPS and more than one PROMO. So to put it another way, ZIPS that cross FIPS with different PROMO. I don't need to worry about COUNTY, FACTOR_X and FACTOR_Y but I do need to keep them.

There are three possible PROMO levels (LOW, ZERO, VARIABLE). If a ZIP is present in more than one FIPS and the PROMO is not distinct within the subset, the STATUS is FAIL. If however, a ZIP is present in more than one FIPS and the PROMO is distinct (the same) within the subset, the STATUS is PASS.

Here are some sample output

ZIP,ST_COUNTY,FIPS,FACTOR_X,FACTOR_Y,PROMO,STATUS

8520,NJ MERCER,34021,XXX,YYY,VARIABLE,FAIL

8520,NJ MIDDLESEX,34023,XXX,YYY,ZERO,FAIL

8525,NJ XUNTERDON,34019,XXX,YYY,LOW,FAIL

8525,NJ MERCER,34021,XXX,YYY,VARIABLE,FAIL

8540,NJ MERCER,34021,XXX,YYY,VARIABLE,FAIL

8540,NJ MIDDLESEX,34023,XXX,YYY,ZERO,FAIL

8540,NJ SOMERSET,34035,XXX,YYY,ZERO,FAIL

8558,NJ XUNTERDON,34019,XXX,YYY,LOW,FAIL

8558,NJ MERCER,34021,XXX,YYY,VARIABLE,FAIL

8558,NJ SOMERSET,34035,XXX,YYY,ZERO,FAIL

23117,VA GOOCXLAND,51075,XXX,YYY,ZERO,FAIL

23117,VA LOUISA,51109,XXX,YYY,LOW,FAIL

23117,VA SPOTSYLVANIA,51177,XXX,YYY,LOW,FAIL

23117,VA SPOTSYLVANIA,51177,XXX,YYY,ZERO,FAIL

78015,TX BEXAR,48029,XXX,YYY,ZERO,FAIL

78015,TX COMAL,48091,XXX,YYY,LOW,FAIL

78015,TX COMAL,48091,XXX,YYY,ZERO,FAIL

78015,TX KENDALL,48259,XXX,YYY,LOW,FAIL

78015,TX KENDALL,48259,XXX,YYY,ZERO,FAIL

1906,MA ESSEX,25009,XXX,YYY,LOW,PASS

1906,MA MIDDLESEX,25017,XXX,YYY,LOW,PASS

1906,MA SUFFOLK,25025,XXX,YYY,LOW,PASS

16127,PA BUTLER,42019,XXX,YYY,ZERO,PASS

16127,PA LAWRENCE,42073,XXX,YYY,ZERO,PASS

16127,PA MERCER,42085,XXX,YYY,ZERO,PASS

16127,PA VENANGO,42121,XXX,YYY,ZERO,PASS

For output, I would like to split into two files with the original columns plus the new STATUS column. The first file would be ZIP_PASS with all STATUS == PASS and the second file would be ZIP_FAIL with all STATUS == FAIL

=======

You should explicitly provide expected output corresponding to your example. For details: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/28481250#28481250 — Frank, Sep 08 '17 at 14:13
It'd help if you add the STATUS column you expect to your sample data. Would it be FAIL for the first four rows (because they're all PROMO=LOW) and PASS for the last four rows? — Gabe, Sep 08 '17 at 15:22

moodymudskipper · Answer 1 · 2017-09-08T15:43:51.537

0

would this work ?

library(dplyr)
df2 <- df1 %>% group_by(ZIP) %>%
  mutate(STATUS=c("PASS","FAIL")[(n_distinct(PROMO)==n())+1]) %>%
  ungroup

df_pass <- filter(df2,STATUS=="PASS")
df_fail <- filter(df2,STATUS=="FAIL")

# # A tibble: 8 x 7
# # Groups:   ZIP [4]
# ZIP     ST_COUNTY  FIPS FACTOR_X FACTOR_Y PROMO STATUS
# <int>         <chr> <int>    <chr>    <chr> <chr>  <chr>
# 1  1431  MA MIDDLESEX 25017      XXX      YYY   LOW   PASS
# 2  1431  MA WORCESTER 25027      XXX      YYY   LOW   PASS
# 3  1434  MA MIDDLESEX 25017      XXX      YYY   LOW   PASS
# 4  1434  MA WORCESTER 25027      XXX      YYY   LOW   PASS
# 5  4009 ME CUMBERLAND 23005      XXX      YYY  ZERO   FAIL
# 6  4009     ME OXFORD 23017      XXX      YYY   LOW   FAIL
# 7  4029 ME CUMBERLAND 23005      XXX      YYY  ZERO   FAIL
# 8  4029     ME OXFORD 23017      XXX      YYY   LOW   FAIL

data

df1 <- read.table(text="ZIP,ST_COUNTY,FIPS,FACTOR_X,FACTOR_Y,PROMO
01431,MA MIDDLESEX,25017,XXX,YYY,LOW
01431,MA WORCESTER,25027,XXX,YYY,LOW
01434,MA MIDDLESEX,25017,XXX,YYY,LOW
01434,MA WORCESTER,25027,XXX,YYY,LOW
04009,ME CUMBERLAND,23005,XXX,YYY,ZERO
04009,ME OXFORD,23017,XXX,YYY,LOW
04029,ME CUMBERLAND,23005,XXX,YYY,ZERO
04029,ME OXFORD,23017,XXX,YYY,LOW",sep=",",header=T,stringsAsFactors=F)

edited Sep 08 '17 at 15:43

answered Sep 08 '17 at 15:33

moodymudskipper

46,417
11
121
167

This is great! To create the two tables and split it up could just be a filter on STATUS, correct? – Tony Sep 08 '17 at 15:38
yes, these are the tables `df_pass` and `df_fail` :) – moodymudskipper Sep 08 '17 at 15:43
I added an `ungroup` statement at the end of the first query to be safe – moodymudskipper Sep 08 '17 at 15:44
Hey mudskipper, I updated this problem to look for three levels of PROMO. The example you provided works for two level but not three. Any suggestions? – Tony Sep 09 '17 at 14:37
Maybe `(n_distinct(PROMO) >1)` instead of `(n_distinct(PROMO)==n())`? – moodymudskipper Sep 09 '17 at 16:59

How to write R function to find a specific condition in a dataframe

1 Answers1