Unique rows in data.frame

Question

I have this data.frame:

  V1     V2
1 RAB27A RAD21
2 RAB27A STAT1
3 ITGA4  RAD21
4 PANK3  SIX5
5 PANK3  SREBF1
6 PANK3  USF1

And I would like it looks like this:

  V1     V2    V3     V4
1 RAB27A RAD21 STAT1
2 ITGA4  RAD21
3 PANK3  SIX5  SREBF1 USF1

I'm beginner. Help please.

Jaap · Answer 1 · 2016-06-05T10:19:06.213

You can achieve this by using a combination of aggregate, toString (orpaste/paste0) and the cSplit function from the splitstackshape package:

library(splitstackshape)
newdata <- cSplit(aggregate(V2 ~ V1, mydf, toString), 'V2', sep=',', direction='wide')

which gives:

> newdata
       V1  V2_1   V2_2 V2_3
1:  ITGA4 RAD21     NA   NA
2:  PANK3  SIX5 SREBF1 USF1
3: RAB27A RAD21  STAT1   NA

Alternatively, you can use a combination of dplyr and tidyr:

library(dplyr)
library(tidyr)

newdf <- mydf %>% 
  group_by(V1) %>% 
  summarise(V2 = toString(V2)) %>% 
  separate(V2, paste0('V2_',1:3), sep = ',')

which gives:

> newdf
Source: local data frame [3 x 4]

      V1  V2_1    V2_2  V2_3
  (fctr) (chr)   (chr) (chr)
1  ITGA4 RAD21      NA    NA
2  PANK3  SIX5  SREBF1  USF1
3 RAB27A RAD21   STAT1    NA

Used data:

mydf <- read.table(text="V1     V2
                   1 RAB27A RAD21
                   2 RAB27A STAT1
                   3 ITGA4 RAD21
                   4 PANK3 SIX5
                   5 PANK3 SREBF1
                   6 PANK3 USF1", header=TRUE)

@akrun I know it was a dupe, but couldn't find a proper one at first. Your answer put me in the right direction ;-) — Jaap, Jun 05 '16 at 10:25

score 0 · Accepted Answer · answered Jun 05 '16 at 10:05

Here is another option with data.table

library(data.table)
setDT(df1)[, .(V2= toString(V2)), V1][, paste0("V", 2:4) :=tstrsplit(V2, ", ")][]
#       V1    V2     V3   V4
#1: RAB27A RAD21  STAT1   NA
#2:  ITGA4 RAD21     NA   NA
#3:  PANK3  SIX5 SREBF1 USF1

Or this can be done with just dcast

dcast(setDT(df1), V1~rowid(V1, prefix = "V"), value.var="V2")
#       V1    V1     V2   V3
#1:  ITGA4 RAD21     NA   NA
#2:  PANK3  SIX5 SREBF1 USF1
#3: RAB27A RAD21  STAT1   NA

data

df1 <- structure(list(V1 = c("RAB27A", "RAB27A", "ITGA4", "PANK3", "PANK3", 
"PANK3"), V2 = c("RAD21", "STAT1", "RAD21", "SIX5", "SREBF1", 
"USF1")), .Names = c("V1", "V2"), class = "data.frame", row.names = c(NA, -6L))

Here it is biased voting on play as I showed a compact option without any pasteing. — akrun, Jun 05 '16 at 15:01

Unique rows in data.frame

2 Answers2

data