Multiples rows to one row in R

Question

In R, I have a data frame with several values. I would like to have a data frame that transforms the data frame into a data frame with just one row with all the values. I have a data frame like this:

df <- data.frame(A = c("time", "time", "time"),
                 B = c("place", "place", "place"),
                 C = c(NA, 1, NA),
                 D = c(NA, NA, 2),
                 E = c(3, NA, NA),
                 `F` = c(4,4, NA),
                 G = c(NA, 5, NA))

     A     B  C  D  E  F  G
1 time place NA NA  3  4 NA
2 time place  1 NA NA  4  5
3 time place NA  2 NA NA NA

And I want a dataframe like this:

     A     B C D E F G
1 time place 1 2 3 4 5

I tried using the function reshape like here: Turning one row into multiple rows in r

And tried the function unique but than I lose a lot of data: Selecting unique rows in matrix using R

The problem seems to be under-specified: Either we can just use `colMeans(..., na.rm=TRUE)` or there is something else going on. In that case it would make it much easier for others to help you if you provide a better [minimale reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610). The MRE will make it easier for others to find and test a answer to your question. — dario, Feb 10 '21 at 14:36
Do columns A and B have different combinations of times and places? Do columns C-G always have the same value if non-NA, or can it vary? — Otto Kässi, Feb 10 '21 at 14:46
ok, what would be the desired output if column F has values 4 and 100? — Otto Kässi, Feb 10 '21 at 14:54

score 3 · Answer 1 · answered Feb 10 '21 at 15:29

Solution using colMeans to account for diverging values:

1.Create test data

df <- structure(list(
  A = c("time", "time", "time"), 
  B = c("place", "place", "place"), 
  C = c(NA, 1L, NA), 
  D = c(NA, NA, 2L), 
  E= c(3L, NA, NA), 
  F = c(4L, 4L, NA), 
  G = c(NA, 5L, NA)), 
  row.names = c(NA, -3L),
  class = c("data.table", "data.frame"))

2.Use unique for first two columns and colMeans for the rest, cast to data.frame:

cbind(unique(df[, 1:2]), as.data.frame.list(colMeans(df[,3:7], na.rm = TRUE)))

Returns:

      A     B C D E F G
1: time place 1 2 3 4 5

score 1 · Accepted Answer · answered Feb 10 '21 at 16:20

We can group by 'A', 'B' and select the first non-NA element across other columns

library(dplyr)
df1 %>%
     group_by(A, B) %>%
     summarise(across(everything(), ~ .[order(is.na(.))][1]), .groups = 'drop')

-output

# A tibble: 1 x 8
#  A     B         C     D     E     F     G H    
#  <chr> <chr> <int> <int> <int> <int> <int> <lgl>
#1 time  place     1     2     3     4     5 NA

Or with coalesce

library(purrr)
df1 %>%
    group_by(A, B) %>% 
    summarise(across(everything(), ~ reduce(., coalesce)), .groups = 'drop')

data

df1 <- structure(list(A = c("time", "time", "time"), B = c("place", 
"place", "place"), C = c(NA, 1L, NA), D = c(NA, NA, 2L), E = c(3L, 
NA, NA), F = c(4L, NA, NA), G = c(NA, 5L, NA), H = c(NA, NA, 
NA)), class = "data.frame", row.names = c(NA, -3L))

Otto Kässi · Answer 3 · 2021-02-10T14:59:18.710

You can use na.omit() with summarise, a la:

library(tidyverse)

df %>% group_by(A, B) %>% 
       summarise(C = mean(na.omit(C)),
                 D = mean(na.omit(D)),
                 E = mean(na.omit(E)),
                 F = mean(na.omit(F)),
                 G = mean(na.omit(G)))

Your example data has only unique values in each column C-G, so following your comment, I have used mean() to pick up the mean of non-NA observations.

data:

df <- structure(list(A = c("time", "time", "time"), B = c("place",
"place", "place"), C = c(NA, 1L, NA), D = c(NA, NA, 2L), E = c(3L,
NA, NA), F = c(4L, 4L, NA), G = c(NA, 5L, NA)), class = "data.frame", row.names = c(NA,
-3L))

Multiples rows to one row in R

3 Answers3

data

Linked