I have created an RMarkdown that checks for errors, which outputs print
statements that specify the error and what row numbers need to be corrected (which would check the errors in df
below). I have created another dataframe (df.index
in the example below) to track the rows that need to be corrected for each column (that is in df
). Essentially, I need to add a column that stores a list of the rows that needs to be corrected for each column in df
. Then, as I do more error checks, I will need to append to the list in a given row in df.index
and add new lists to other rows for the rows
column in the newly created summary
dataframe.
I have looked through dozens of SO entries on lists, but cannot find a good answer. Here is what I have tried, which I show with this minimal example. This code does work and it gives me the output that I want. However, it is extremely verbose and will probably be hard for others on my project team to be able to read and make sense of it.
Minimal Example
Data
library(dplyr)
# Dataframe that contains the dataset that I'm checking for errors.
df <-
structure(
list(
`1.1.` = c("Andrew", "Max", "Sylvia", NA, "1",
NA, NA, "Jason"),
`1.2.` = c(1, 2, 2, NA, NA, 5, 3, NA),
`1.3.` = c(
"cool",
"amazing",
"wonderful",
"okay",
NA,
"sweet",
"chocolate",
"fine"
)
),
class = "data.frame",
row.names = c(NA, -8L)
)
# Dataframe that contains the column numbers and names, which will be used to create a summary of what rows need to be changed for each column.
df.index <-
structure(list(
number = c("1.1.", "1.2.", "1.3."),
name = c("name",
"number", "category")
),
class = "data.frame",
row.names = c(NA, -3L))
What I have tried
obs <- "1.1."
na.index <- which(is.na(df$`1.1.`))
summary <- df.index %>%
dplyr::mutate(rows = ifelse(number == obs, list(na.index), NA))
# Check to see if there are any numeric values in this character column. Adding 6 just to have a duplicate for this example.
na.index2 <-
c(which(!is.na(as.numeric(
as.character(df$`1.1.`)
))), 6)
# Append new list from na.index2 to the existing list in row 1 (or 1.1.), and keep only the unique values, excluding NAs.
summary <- summary %>%
dplyr::mutate(rows = ifelse(number == obs, list(unique(na.omit(
unlist(append(rows, list(na.index2)))
))), NA))
# Column 1.2. in df.
obs <- "1.2."
na.index3 <- which(df$`1.2.` > 2)
summary <- summary %>%
dplyr::mutate(rows = ifelse(number == obs, list(na.index3), rows))
na.index4 <- which(df$`1.2.` == 2)
summary <- summary %>%
dplyr::mutate(rows = ifelse(number == obs, list(unique(na.omit(
unlist(append(rows[2], list(na.index4)))
))), rows))
# Column 1.3. in df.
obs <- "1.3."
na.index5 <- which(df$`1.3.` == "okay")
summary <- summary %>%
dplyr::mutate(rows = ifelse(number == obs, list(na.index5), rows))
Output (which is also the expected output)
summary
number name rows
1 1.1. name 4, 6, 7, 5
2 1.2. number 6, 7, 2, 3
3 1.3. category 4
I get all of the correct rows in the example above, but there has to be a much simpler way to do this, and without having to create obs
and having to specify the row number (e.g., rows[2]
) when appending a list.
As you can see, not every column has the same error checks. So, I'm hoping to have an easy way to add a list to the rows
column in summary
as I go through similar checks for each category (like 1.2.
, 1.3.
, etc.), as well as being able to append additional lists (like shown here).