1

I want to fill an existing column based on conditions in two others.

The dataframe is called A.

If the columns box=6 AND document = 75 then size= big.

I need to only populate the empty cells in size. The existing entries in that column need to remain.

Example data:

Box      Document         Size
6          75
6          75
7          23              big
7          23              big
7          25
8          13              big
8          13              big

Thank you

--

Dataset formatted for R (output of dput(A)):

A <- structure(
  list(
    Box = c(6, 6, 7, 7, 7, 8, 8),
    Document = c(75, 75, 23, 23, 25, 13, 13),
    Size = c("", "", "big", "big", "", "big", "big")),
  row.names = c(NA,-7L),
  class = c("tbl_df", "tbl", "data.frame")
)
jared_mamrot
  • 22,354
  • 4
  • 21
  • 46
  • 2
    Welcome to SO, Carmel Hila! Please make this question *reproducible*. This includes sample code you've attempted (including listing non-base R packages, and any errors/warnings received), sample *unambiguous* data (e.g., `data.frame(x=...,y=...)` or the output from `dput(head(x))`), and intended output given that input. Refs: https://stackoverflow.com/q/5963269, [mcve], and https://stackoverflow.com/tags/r/info. – r2evans Nov 07 '21 at 23:37

1 Answers1

0

Without more details (i.e. a reproducible example) it is difficult to know whether this answers your question or not, but here is a potential solution:

set.seed(3)
A <- data.frame(box = sample(5:7, size = 50, replace = TRUE),
                document = sample(74:76, size = 50, replace = TRUE))

A$size <- ifelse(A$box == 6 & A$document == 75, "big", "other")
head(A, 10)
#>    box document  size
#> 1    5       76 other
#> 2    6       76 other
#> 3    7       75 other
#> 4    6       75   big
#> 5    7       74 other
#> 6    7       76 other
#> 7    6       75   big
#> 8    7       74 other
#> 9    5       75 other
#> 10   6       74 other

Another potential solution is to use case_when() from the dplyr package:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

A <- data.frame(box = sample(5:7, size = 50, replace = TRUE),
                document = sample(74:76, size = 50, replace = TRUE))
A %>%
  mutate(size = case_when(box == 6 & document == 75 ~ "big",
                          box < 6 & document < 75 ~ "small",
                          document < 75 ~ "medium",
                          TRUE ~ "other"))
#>    box document   size
#> 1    6       75    big
#> 2    6       74 medium
#> 3    7       76  other
#> 4    5       75  other
#> 5    6       76  other
#> 6    5       75  other
#> 7    6       76  other
#> 8    5       74  small
#> 9    5       74  small
#> 10   6       74 medium
#> ...

Created on 2021-11-08 by the reprex package (v2.0.1)

Edit

To address your comment about 'blanks', if they are not "NA":

A <- structure(
  list(
    Box = c(6, 6, 7, 7, 7, 8, 8),
    Document = c(75, 75, 23, 23, 25, 13, 13),
    Size = c("", "", "big", "big", "", "big", "big")),
  row.names = c(NA,-7L),
  class = c("tbl_df", "tbl", "data.frame")
)
A
#> # A tibble: 7 × 3
#>     Box Document Size 
#>   <dbl>    <dbl> <chr>
#> 1     6       75 ""   
#> 2     6       75 ""   
#> 3     7       23 "big"
#> 4     7       23 "big"
#> 5     7       25 ""   
#> 6     8       13 "big"
#> 7     8       13 "big"

A$Size <- ifelse(A$Box == 6 & A$Document == 75, "big", A$Size)
A
#> # A tibble: 7 × 3
#>     Box Document Size 
#>   <dbl>    <dbl> <chr>
#> 1     6       75 "big"
#> 2     6       75 "big"
#> 3     7       23 "big"
#> 4     7       23 "big"
#> 5     7       25 ""   
#> 6     8       13 "big"
#> 7     8       13 "big"

Created on 2021-11-09 by the reprex package (v2.0.1)

jared_mamrot
  • 22,354
  • 4
  • 21
  • 46
  • The first option worked A$size <- ifelse(A$box == 6 & A$document == 75, "big", "other"). But it replaced any existing values that were in the size column. Is there any way to run it so that a value is only assigned when the size column is empty? – Carmel Hilal Nov 08 '21 at 01:26
  • What do you mean "empty"? Do you mean "NA"? If so, perhaps `A$size <- ifelse(is.na(A$size) & A$box == 6 & A$document == 75, "big", A$size)` – jared_mamrot Nov 08 '21 at 02:28
  • I edited my post to include a sample of the data. This command A$size <- ifelse(is.na(A$size) & A$box == 6 & A$document == 75, "big", A$size) is changing the existing big entries into 1 – Carmel Hilal Nov 08 '21 at 15:31
  • I have edited my answer; does this solve your problem @CarmelHilal ? – jared_mamrot Nov 08 '21 at 22:18
  • 1
    Thank you Jared. It worked yes. – Carmel Hilal Nov 18 '21 at 18:58