Combining rows based on the string in one of the columns

Question

I have the data as below:

> dput(test_data)
structure(list(`10` = c(0, 0, 0, 0, 0, 1, 0.22123412423, 0.0123915695, 
0.0126915225, 0.4312, 1), `34` = c(0, 0, 0, 0, 0, 0.323256312, 
0.32423561, 0.44451256, 0.33623498, 0.21341411, 0.321223), `59` = c(0.232, 
0.57192, 0, 0, 0, 0.31312334, 0.2775713, 0.1311453, 0.63215713, 
0.4423123, 0.132212), `84` = c(1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 
0), `110` = c(0.55345123, 0.689875, 0.423415551, 1, 0.444532121, 
0, 0.01186404, 0.22132204, 0.21186404, 0, 0), `134` = c(0.234126, 
0.33347267, 0.35321234, 0.4123412, 0.333412666, 0, 0, 0, 0.3123, 
0, 0), `165` = c(0.1147189, 0.12343, 0.3155, 0.2755712, 0.123, 
0, 0, 0, 0, 0, 0), `199` = c(0.1236836, 0.0058933, 0, 0.1344607, 
0, 0, 0, 0, 0, 0, 0), Group = c("Train", "Bicycle", "Car", "Bicycle", 
"Bicycle", "Car", "Bicycle", "Train", "Car", "Train", "Train"
)), .Names = c("10", "34", "59", "84", "110", "134", "165", "199", 
"Group"), row.names = c("Mark_1", "Greg_1", "Tim_1", "Tom_1", 
"Sim_1", "Karl_1", "Moham_1", "Teraq_1", "Jake_1", "Sonya_1", 
"Monique_1"), class = "data.frame")

I would like to take rows from the same Group and combine them into one row. The row.names are not important in that case. As a row name the group should be used. Of course some groups have more members (especially in my real data) so the missing values in specific columns should be filled with NA's.

For example values for Greg, Tom, Sim, Moham should be in one row under the name Bicycle.

Expected output;

We have 4 members of the group Bicycle. Each of the member has 8 values = 8 columns. So as an output we should have a row.name=Bicycle and 32 columns with the values coming from those 4 members. We should do the same with all the groups.

EDIT:

So I expected something like Rechlay and Wietze314 posted. Is there any way to remove those NA values and keep the number of columns up to maximal number of members multiplied by 8. I do not care about colnames and from which/column value came from.

I would like to keep `0` values as `0` but I think that in my real data there shouldn't be any `0`. Just all the columns should be taken. — Shaxi Liver, Feb 02 '17 at 11:35
So, as I mention we take `Bicycle` group. We have 4 members of this group. Each of the member has 8 values = 8 columns. So as an output we should have a `row.name=Bicycle` and 32 columns with the values coming from those 4 members. We can name those columns using the name of the guy who "gave" this value. For example `Greg_1_10`, `Greg_1_34`,....., `Tom_1_10`,..., `Tom_1_110`. — Shaxi Liver, Feb 02 '17 at 11:43
Seems like some kind of [long to wide](http://stackoverflow.com/questions/5890584/how-to-reshape-data-from-long-to-wide-format) transformation. — Ronak Shah, Feb 02 '17 at 11:52
As I see I did not explain it correctly. Let me show you the expected output. Need 3 mins. — Shaxi Liver, Feb 02 '17 at 12:17

Wietze314 · Accepted Answer · 2017-02-02T13:16:43.203

Using the tidyr functions gather and spread:

library(tidyr)
library(dplyr)

result <- test_data %>% mutate(person = row.names(test_data)) %>% 
  gather(key, value, -Group, -person) %>%
  mutate(column = paste0(person,'_',key)) %>%
  select(column, Group, value) %>%
  spread(column, value)

But you will end up with a lot more columns than you anticipate, so maybe you can give us the expected output (as mentioned in the comments).

EDIT: To get 8 columns per person use:

result <- test_data %>% mutate(person = row.names(test_data)) %>% 
  gather(key, value, -Group, -person) %>%
  group_by(Group, person) %>%
  mutate(column = paste0(person,'_',row_number(key))) %>%
  ungroup() %>%
  select(column, Group, value) %>%
  spread(column, value)

Finally if you just want 32 columns without caring which value comes from which person you can use:

result <- test_data %>% mutate(person = row.names(test_data)) %>% 
  gather(key, value, -Group, -person) %>%
  group_by(Group) %>% arrange(person, key) %>%
  mutate(column = row_number()) %>%
  ungroup() %>%
  select(column, Group, value) %>%
  spread(column, value)

To remove NA values from the dataframe use:

result[is.na(result)] <- 0

Please take a look on my comment below `akrun's` answer. – Shaxi Liver Feb 02 '17 at 12:58 — Shaxi Liver, Feb 02 '17 at 12:58
The last one is exactly what I was looking for! – Shaxi Liver Feb 02 '17 at 14:35 — Shaxi Liver, Feb 02 '17 at 14:35

akrun · Answer 2 · 2017-02-02T13:03:21.960

2

Perhaps we can also use data.table

library(data.table)
dcast(melt(setDT(test_data, keep.rownames = TRUE), id.var= c("rn", "Group")), 
           Group~variable + rowid(variable), value.var = "value")

Update

dcast(melt(setDT(test_data, keep.rownames = TRUE), id.var= c("rn",
        "Group")), Group+rowid(variable) ~variable, value.var = "value")

edited Feb 02 '17 at 13:03

answered Feb 02 '17 at 12:49

akrun

874,273
37
540
662

Look at my new edit. Is it possible to make a table narrower ? Just store values below each other and extend it to minimum. In this specific case it would be 32 columns. Two rows with 32 columns full of numbers and one row with 24 values and 8 times `NA` at the end of table. – Shaxi Liver Feb 02 '17 at 12:55
@ShaxiLiver I updated the post, but I am not sure whether this is the structure you wanted – akrun Feb 02 '17 at 13:03

score 0 · Answer 3 · answered Feb 02 '17 at 12:25

0

test_data$row <- row.names(test_data) 
data <- reshape(test_data, idvar= "Group", timevar = "row", direction = "wide")

It will give you a long format of your data. The columns will be named as you wanted before an edit.

answered Feb 02 '17 at 12:25

Rechlay

1,457
2
12
19

Combining rows based on the string in one of the columns

3 Answers3

Update