1

I know how to do this in excel, but am trying to translate into R and create a new column. In R I have a data frame called CleanData. I want to see how many times the value in each row of column A shows up in all of column B. In excel it would read like this:

=COUNTIF(B:B,A2)>0,C="Purple")

The second portion would be a next if / and statement. It would look like this in excel:

=IF(AND(COUNTIF(B:B,A2)>0,C="Purple"),"Yes", "No") 

Anyone know where to start?

I have tried mutating and also this:

sum(CleanData$colA == CleanData$colB) 

and am getting no values

enter image description here

r2evans
  • 141,215
  • 6
  • 77
  • 149
lpw0806
  • 11
  • 2
  • 1
    Can you provide `dput(CleanData)`? – jrcalabrese Dec 06 '22 at 20:44
  • It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Dec 06 '22 at 20:49
  • Also, you may want to have a look at the documentation of `dplyr::case_when()` - https://dplyr.tidyverse.org/reference/case_when.html – giocomai Dec 06 '22 at 20:58
  • I am trying to upload a sample dataset! @MrFlick – lpw0806 Dec 06 '22 at 21:01
  • It would be great if you could make your [reproducible example as minimal as possible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example), so if `CleanData` is very large, please only share a small portion of it using `dput()`. – jrcalabrese Dec 06 '22 at 21:08
  • 1
    Please [do not post code or data in images](https://meta.stackoverflow.com/q/285551/2372064). See the link I provided on including reproducible examples in your question. Using a `dput()` would be a good way to share data. – MrFlick Dec 06 '22 at 21:08
  • @lpw0806, see how I provided sample data in my answer, that's generally (one of) the preferred ways to share sample data so that others can easily work on it. – r2evans Dec 06 '22 at 21:23

2 Answers2

2

You don't need any extra packages, here is a solution with the base R function ifelse, which is a frequently very useful function you should learn. An example:

set.seed(7*11*13)  
DF <- data.frame(cond=rnorm(100), X= sample(c("Yes","No"), 100, replace=TRUE))
with(DF, sum(ifelse( (cond>0)&(X=="Yes"), 1, 0))) 
kjetil b halvorsen
  • 1,206
  • 2
  • 18
  • 28
1

I think this will capture your if/countif scenario:

library(dplyr)
CleanData %>%
  mutate(YesOrNo = case_when(Color != "Purple" ~ "No", is.na(LABEL1) | !nzchar(LABEL1) ~ "No", !LABEL1 %in% LABEL2 ~ "No", TRUE ~ "Yes"))
#    LABEL1   LABEL2  Color YesOrNo
# 1   HELLO     <NA> Purple     Yes
# 2    <NA> HELLO!!!   Blue      No
# 3 HELLO$$     <NA> Purple     Yes
# 4    <NA>    HELLO   Blue      No
# 5 HELLOOO     <NA> Purple     Yes
# 6    <NA>     <NA> Purple      No
# 7    <NA>  HELLOOO   Blue      No
# 8    <NA>  HELLO$$   Blue      No
# 9    <NA>    HELLO Yellow      No

Data

CleanData <- structure(list(LABEL1 = c("HELLO", NA, "HELLO$$", NA, "HELLOOO", NA, NA, NA, NA), LABEL2 = c(NA, "HELLO!!!", NA, "HELLO", NA, NA, "HELLOOO", "HELLO$$", "HELLO"), Color = c("Purple", "Blue", "Purple", "Blue", "Purple", "Purple", "Blue", "Blue", "Yellow")), class = "data.frame", row.names = c(NA, -9L))

or programmatically,

CleanData <- data.frame(LABEL1=c("HELLO",NA,"HELLO$$",NA,"HELLOOO",NA,NA,NA,NA), LABEL2=c(NA,"HELLO!!!",NA,"HELLO",NA,NA,"HELLOOO","HELLO$$","HELLO"),Color=c("Purple","Blue","Purple","Blue","Purple","Purple","Blue","Blue","Yellow"))
r2evans
  • 141,215
  • 6
  • 77
  • 149