Combine rows of a dataframe based on common characters in a column in r

Question

I have a very large table with data and it looks like below:

    Name      Type     U1     U2     U3
1    A         Max     0      543     0
2    A         Min     -324    0     -876
3    B         Max     4536    0      0
4    B         Min     0     -56     -96
5    C         Max     0     543      0
6    C         Min     -32    0     -426
7    D         Max     4612   0       0
8    D         Min     0     -456    -86

For each Name (A,B,C,D) there are two values per column, one for max and one for min. Between the two one will always be zero. Either the max is zero and the min is negative or the min is zero and the max is positive. I would like to create a subset of the data.frame in which I can combine the rows corresponding to the same Name (A,B,C,D) and remove the cells that equal to zero.

I was not sure what would be the best way around this. Any guidance in appreciated!

Ronak Shah · Accepted Answer · 2020-07-02T04:05:44.480

2

You can group_by Name and select the non-zero value in the 3 columns

library(dplyr)
df %>%
  group_by(Name) %>%
  summarise(across(starts_with('U'), ~.[.!=0]))
  #In previous dplyr
  #summarise_at(vars(starts_with('U')), ~.[.!=0])

# A tibble: 4 x 4
#  Name     U1    U2    U3
#  <chr> <int> <int> <int>
#1 A      -324   543  -876
#2 B      4536   -56   -96
#3 C       -32   543  -426
#4 D      4612  -456   -86

You can also sum the columns if there are only 2 rows for each Name and one of them is always 0

df %>% group_by(Name) %>% summarise(across(starts_with('U'), sum))

Using aggregate in base R :

aggregate(cbind(U1, U2, U3)~Name, df, function(x) x[x!=0])

and with sum :

aggregate(cbind(U1, U2, U3)~Name, df, sum)

data

df <- structure(list(Name = c("A", "A", "B", "B", "C", "C", "D", "D"
), Type = c("Max", "Min", "Max", "Min", "Max", "Min", "Max", 
"Min"), U1 = c(0L, -324L, 4536L, 0L, 0L, -32L, 4612L, 0L), U2 = c(543L, 
0L, 0L, -56L, 543L, 0L, 0L, -456L), U3 = c(0L, -876L, 0L, -96L, 
0L, -426L, 0L, -86L)), class = "data.frame", row.names = c(NA, -8L))

edited Jul 02 '20 at 04:05

answered Jul 02 '20 at 03:37

Ronak Shah

377,200
20
156
213

Thank you for your response! I don't actually want to sum them across the column. As you show after grouping the data by the Name, in each column, I want to drop the cell that equals zero and keep the other one. When I follow the code you provided I end up with one column U instead of U1, U2, and U3. – Maral Dorri Jul 02 '20 at 03:58
1

@MaralDorri This code does not sum values across column. It is selecting non-zero value for each `Name` in columns that start with "U". What are the column names in your actual data? Do you have `plyr` loaded along with `dplyr`? Try with `dplyr:: summarise` instead. – Ronak Shah Jul 02 '20 at 04:02
Thank you that explains it! I was able to make a few adjustments to match my column names! – Maral Dorri Jul 02 '20 at 05:18

Combine rows of a dataframe based on common characters in a column in r

1 Answers1