R coalesce down columns by identifer

Question

I have a long dataset with student grades and courses going across many semesters. It has many NAs and many rows for each student. I want it to have one long row per student to fill in those NAs but keep the same column names.

Here's a sample:

library(tidyverse)
sample <- tibble(student = c("Corey", "Corey", "Sibley", "Sibley"),
                 fall_course_1 = c("Math", NA, "Science", NA),
                 fall_course_2 = c(NA, "English", NA, NA),
                 fall_grade_1 = c(90, NA, 98, NA),
                 fall_grade_2 = c(NA, 60, NA, NA))

And here's what I'd like it to look like:

library(tidyverse)
answer <- tibble(student = c("Corey", "Sibley"),
                 fall_course_1 = c("Math", "Science"),
                 fall_course_2 = c("English", NA),
                 fall_grade_1 = c(90, 98),
                 fall_grade_2 = c(60, NA))

Some semesters, some students take many classes and some just one. I've tried using coalesce(), but I can't figure it out. Any help would be appreciated!

@camille, I tried grouping by the student column and using coalesce(). I then tried the suggestion on this link, but it goes across columns, not down them: https://community.rstudio.com/t/coalesce-rows-and-remove-nas-but-keep-all-non-na-rows-for-each-unique-group/15894. — J.Sabree, Mar 12 '20 at 00:06
Instead of continuing the struggle, you should probably reshape your data set into long format, where each grade is a row. This will make these sort of contortions unnecessary. See https://stackoverflow.com/questions/2185252/reshaping-data-frame-from-wide-to-long-format — Axeman, Mar 12 '20 at 00:07
I should have been more specific. Add your code attempts to the question, so folks can see the logic you're approaching it with and aren't suggesting things you've already tried — camille, Mar 12 '20 at 00:12

score 4 · Answer 1 · edited Mar 12 '20 at 00:19

This should do it, pivot the data long, remove the na's and then pivot it back to wide.

You need to convert the numeric values to character temporarily so they can go in the same column as the course labels, then type_convert() is a lazy way to put them back again.

library(dplyr)
library(tidyr)
library(readr)

reshaped <- sample %>%
  mutate_if(is.numeric, as.character) %>%
  pivot_longer(-student) %>% 
  drop_na() %>% 
  pivot_wider(student, names_from = name, values_from = value) %>% 
  type_convert()

score 2 · Answer 2 · answered Mar 12 '20 at 00:07

Using a custom colaesce function and dplyr:

coalesce_all_columns <- function(df) {
  return(coalesce(!!! as.list(df)))
}

library(dplyr)

sample %>%
  group_by(student) %>%
  summarise_all(coalesce_all_columns)

# A tibble: 2 x 5
  student fall_course_1 fall_course_2 fall_grade_1 fall_grade_2
  <chr>   <chr>         <chr>                <dbl>        <dbl>
1 Corey   Math          English                 90           60
2 Sibley  Science       NA                      98           NA

score 2 · Accepted Answer · answered Mar 12 '20 at 00:07

2

You could get the first non-NA value in each column for each student.

library(dplyr)
sample %> group_by(student) %>% summarise_all(~na.omit(.)[1])

# A tibble: 2 x 5
#  student fall_course_1 fall_course_2 fall_grade_1 fall_grade_2
#  <chr>   <chr>         <chr>                <dbl>        <dbl>
#1 Corey   Math          English                 90           60
#2 Sibley  Science       NA                      98           NA

This approach returns NA if there are all NA values in a group.

answered Mar 12 '20 at 00:07

Ronak Shah

377,200
20
156
213

this is great, thank you! Quick follow up, can you explain the summarise_all(~na.omit(.)[1]) aspect of your code? I can confirm that it works, but I'd love to know how to read that. Thank you! – J.Sabree Mar 12 '20 at 00:13
@J.Sabree As the name suggests, `na.omit` removes all the `NA` values and we select 1st value after removing `NA`'s using indexing `[1]`. When you have all `NA` values in the data, it returns `NA` by default. `na.omit(c(NA, NA, NA))[1]` – Ronak Shah Mar 12 '20 at 00:17

score 2 · Answer 4 · answered Mar 12 '20 at 00:15

You could also use data.table package as follows:

library(data.table)
setDT(sample)[, lapply(.SD, na.omit), student]

sample
# 1:   Corey          Math       English           90           60
# 2:  Sibley       Science          <NA>           98           NA

R coalesce down columns by identifer

4 Answers4