Deleting duplicates in R, changing remainder

Question

I have a fairly straightforward question, but very new to R and struggling a little. Basically I need to delete duplicate rows and then change the remaining unique row based on the number of duplicates that were deleted.

In the original file I have directors and the company boards they sit on, with directors appearing as a new row for each company. I want to have each director appear only once, but with column that lists the number of their board seats (so 1 + the number of duplicates that were removed) and a column that lists the names of all companies on which they sit.

So I want to go from this:

enter image description here

To this

enter image description here

Bonus if I can also get the code to list the directors "home company" as the company on which she/he is an executive rather than outsider.

Thanks so very much in advance! N

What you want to do is `aggregate` by the director column and use `length` as the aggregating function (this will count how many rows each director has). — R. Schifini, May 17 '18 at 15:41
Related: [*Collapse / concatenate / aggregate a column to a single comma separated string within each group*](https://stackoverflow.com/q/15933958/2204410) and [*Count number of rows within each group*](https://stackoverflow.com/q/9809166/2204410) — Jaap, May 17 '18 at 15:57

Luis Vargas · Accepted Answer · 2018-05-17T16:57:29.740

You could use the ddply function from plyr package

#First I will enter a part of your original data frame

Name <- c('Abbot, F', 'Abdool-Samad, T', 'Abedian, I', 'Abrahams, F', 'Abrahams, F', 'Abrahams, F')
Position <- c('Executive Director', 'Outsider', 'Outsider', 'Executive Director','Outsider',  'Outsider')
Companies <- c('ARM', 'R', 'FREIT', 'FG', 'CG', 'LG')
NoBoards <- c(1,1,1,1,1,1)

df <- data.frame(Name, Position, Companies, NoBoards)

# Then you could concatenate the Positions and Companies for each Name
library(plyr)

sumPosition <- ddply(df, .(Name), summarize, Position = paste(Position, collapse=", "))
sumCompanies <- ddply(df, .(Name), summarize, Companies = paste(Companies, collapse=", "))

# Merge the results into a one data frame usin the name to join them
df2 <- merge(sumPosition, sumCompanies, by = 'Name')

# Summarize the number of oBoards of each Name
names_NoBoards <-  aggregate(df$NoBoards, by = list(df$Name), sum)
names(names_NoBoards) <- c('Name', 'NoBoards')

# Merge the result whit df2 
df3 <- merge(df2, names_NoBoards, by = 'Name')

You get something like this

                Name                               Position  Companies NoBoards
1        Abbot, F                     Executive Director        ARM        1
2 Abdool-Samad, T                               Outsider          R        1
3      Abedian, I                               Outsider      FREIT        1
4     Abrahams, F Executive Director, Outsider, Outsider FG, CG, LG        3

In order to get a list the directors "home company" as the company on which she/he is an executive rather than outsider. You could use the next code

ExecutiveDirector <- df[Position == 'Executive Director', c(1,3)]

df4 <- merge(df3, ExecutiveDirector, by = 'Name', all.x = TRUE)

You get the next data frame

             Name                               Position Companies.x NoBoards Companies.y
1        Abbot, F                     Executive Director         ARM        1         ARM
2 Abdool-Samad, T                               Outsider           R        1        <NA>
3      Abedian, I                               Outsider       FREIT        1        <NA>
4     Abrahams, F Executive Director, Outsider, Outsider  FG, CG, LG        3          FG

That looks like everything I need - thank you SO much! – nred May 17 '18 at 18:38 — nred, May 17 '18 at 18:38

Deleting duplicates in R, changing remainder

1 Answers1