0

I have like below mentioned dataframe:

Records:

ID        Remarks         Value
1         ABC             10
1         AAB             12
1         ZZX             15
2         XYZ             12
2         ABB             14

By utilizing the above mentioned dataframe, I want to add new column Status in the existing dataframe.

Where if the Remarks is ABC, AAB or ABB than status would be TRUE and for XYZ and ZZX it should be FALSE.

I am using below mentioned method for that but it didn't work.

Records$Status<-ifelse(Records$Remarks %in% ("ABC","AAB","ABB"),"TRUE",
                             ifelse(Records$Remarks %in% 
                      ("XYZ","ZZX"),"FALSE"))

And, bases on the Status i want to derive following output:

ID     TRUE    FALSE    Sum
1       2       1        37
2       1       1        26
Sotos
  • 51,121
  • 6
  • 32
  • 66
Vector JX
  • 179
  • 4
  • 23

5 Answers5

2
Records$Status<-ifelse(Records$Remarks %in% c("ABC","AAB","ABB"),TRUE,
                        ifelse(Records$Remarks %in% 
                                   c("XYZ","ZZX"),FALSE, NA))

You need to enclose your lists of strings with c(), and add an "else" condition for the second ifelse (but see Roman's answer below for a better way of doing this with case_when). (Also note that here I changed the "TRUE" and "FALSE" (as character class) into TRUE and FALSE (the logical class).

For the summary (using dplyr):

Records %>% group_by(ID) %>% 
dplyr::summarise(trues=sum(Status), falses=sum(!Status), sum=sum(Value))

# A tibble: 2 x 4
     ID trues falses   sum
  <int> <int>  <int> <int>
1     1     2      1    37
2     2     1      1    26

Of course, if you don't really need the intermediate Status column but just want the summary table, you can skip the first step altogether:

Records %>% group_by(ID) %>% 
dplyr::summarise(trues=sum(Remarks %in% c("ABC","AAB","ABB")), 
  falses=sum(Remarks %in% c("XYZ","ZZX")), 
  sum=sum(Value))
iod
  • 7,412
  • 2
  • 17
  • 36
  • lol, you beat me to it. Was about the give the exact same response. :-) – Roman Oct 30 '18 at 12:54
  • Yeah, it's a pretty straight-forward solution. :) – iod Oct 30 '18 at 12:56
  • No need for an `ifelse` statement. A simple `df$Remarks %in% c("ABC","AAB","ABB")` would suffice. – Sotos Oct 30 '18 at 13:04
  • @iod ID column is not coming when i am running this code, I don;t know everything seems fine. – Vector JX Oct 30 '18 at 13:05
  • @Sotos: That depends on whether or not the OP wants to have anything that doesn't fall into either list be a FALSE or an NA. I took the more cautious route of assuming anything outside the lists provided by OP is NA. If anything outside `c("ABC","AAB","ABB")` should be FALSE, you are of course correct. – iod Oct 30 '18 at 13:06
  • @Vector JX - did you remember to include the `group_by(ID)`? – iod Oct 30 '18 at 13:07
  • @iod Yes....but don't know why this is happening. – Vector JX Oct 30 '18 at 13:07
  • 1
    Agreed. I'm just mentioning it because from OP's `ifelse` statement it seems that there are no other categories. Idk...there is something about declaring `TRUE` on a logical operation that bothers me...maybe my OCD :) – Sotos Oct 30 '18 at 13:08
  • @Vector JX Where doesn't it show? In the summary? You might have `plyr` loaded. Try typing `dplyr::summarise` to make sure. – iod Oct 30 '18 at 13:08
  • @iod Now `ID` is coming but now value of `TRUE` , `FALSE` and `SUM` are same for all `ID`. – Vector JX Oct 30 '18 at 13:14
  • run `conflicts()` and see if anything else from the code has a conflict (possibly group_by, though I don't know what it can conflict with). Meanwhile, make all the functions' search path explicit (i.e., `dplyr::group_by`) – iod Oct 30 '18 at 13:22
1

Since it makes sense to use dplyr for your second question (see @iod's answer) it is also a good opportunity to use the package's very straightforward case_when() function for the first part.

Records %>% 
    mutate(Status = case_when(Remarks %in% c("ABC", "AAB", "ABB") ~ TRUE,
                              Remarks %in% c("XYZ", "ZZX") ~ FALSE,
                              TRUE ~ NA))

  ID Remarks Value Status
1  1     ABC    10   TRUE
2  1     AAB    12   TRUE
3  1     ZZX    15  FALSE
4  2     XYZ    12  FALSE
5  2     ABB    14   TRUE
Roman
  • 4,744
  • 2
  • 16
  • 58
0

This approach will scale to a large number of remarks.

Load the data and prepare a matching data frame

The second data frame makes a matching between remarks and their TRUE or FALSE value.

library(readr)
library(dplyr)
library(tidyr)
dtf <- read_table("id        remarks         value
1         ABC             10
1         AAB             12
1         ZZX             15
2         XYZ             12
2         ABB             14")
truefalse <- data_frame(remarks = c("ABC", "AAB", "ABB", "ZZX", "XYZ"),
                        tf = c(TRUE, TRUE, TRUE, FALSE, FALSE))

Group by id and summarise

This is the format as asked in the question

dtf %>% 
    left_join(truefalse, by = "remarks") %>% 
    group_by(id) %>% 
    summarise(true = sum(tf),
              false = sum(!tf),
              value = sum(value)) 

# A tibble: 2 x 4
     id  true false value
  <int> <int> <int> <int>
1     1     2     1    37
2     2     1     1    26

Alternative proposal: group by id, tf and summarise

This option retains more details on the spread of value along the grouping variables id and tf.

    dtf %>% 
        left_join(truefalse, by = "remarks") %>% 
        group_by(id, tf) %>% 
        summarise(n = n(),
                  value = sum(value))
# A tibble: 4 x 4
# Groups:   id [?]
     id tf        n value
  <int> <lgl> <int> <int>
1     1 FALSE     1    15
2     1 TRUE      2    22
3     2 FALSE     1    12
4     2 TRUE      1    14
Paul Rougieux
  • 10,289
  • 4
  • 68
  • 110
  • left_join sounds like a far less efficient solution than @roman's suggestion of using `case_when`... Plus, OP didn't want to sum the value by `Status`, only by `ID`. – iod Oct 30 '18 at 13:33
  • OK I added that in my answer, I think the use of `left_join()` will scale better because you can provide a data frame of matching `remarks` from an external file or a database. – Paul Rougieux Oct 30 '18 at 13:42
0

In most cases, life is easier and lines are shorter without ifelse:

# short version
df$Status <- df$Remarks %in% c("ABC","AAB","ABB")  

This version is OK for most purposes but it has shortcomings. Status will be FALSE if Remarks is NA or, say "garbage" but one might want it to be NA in these cases and FALSE only if Remarks %in% c("XYZ", "ZZX"). So one can add and multiply the conditions and finally convert it to logical:

df$Status <- as.logical(with(df,
                  Remarks %in% c("ABC","AAB","ABB")  +
                  ! Remarks %in% c("XYZ","ZZX") ))

And the summary table with base R:

aggregate(df[,-(1:2)], df["ID"], function(x) if(is.numeric(x)) sum(x) else table(x))

Umm... perhaps some formatting would be useful:

t1 <- aggregate(df[,-(1:2)], df["ID"], function(x) if(is.numeric(x)) sum(x) else table(x))
t1 <- t1[, c(1,3,2)]
colnames(t1) <- c("ID", "", "Sum")
t1
#   ID FALSE TRUE Sum
# 1  1     1    2  37
# 2  2     1    1  26
lebatsnok
  • 6,329
  • 2
  • 21
  • 22
0

This one returns correct result, only if there are two mentioned groups ("ABC", "AAB", "ABB" vs "XYZ","ZZX", ...). For me @iod's solution, is more R-like, but I've tried to avoid ifelse, and do it another way:

Code:

library(tidyverse)

dt %>%
  group_by(ID, Status = grepl("^A[AB][CB]$", Remarks)) %>%
  summarise(N = n(), Sum = sum(Value)) %>%
  spread(Status, N) %>%
  summarize_all(sum, na.rm = T) %>%                       # data still groupped by ID
  select("ID", "TRUE", "FALSE", "Sum")

# A tibble: 2 x 4
     ID `TRUE` `FALSE`   Sum
  <int>  <int>   <int> <int>
1     1      2       1    37
2     2      1       1    26

Data:

dt <- structure(
  list(ID = c(1L, 1L, 1L, 2L, 2L), 
       Remarks = c("ABC", "AAB", "ZZX", "XYZ", "ABB"),
       Value = c(10L, 12L, 15L, 12L, 14L)), 
  .Names = c("ID", "Remarks", "Value"), class = "data.frame", row.names = c(NA, -5L)
  )
utubun
  • 4,400
  • 1
  • 14
  • 17