Long to wide with duplicate row names

Question

I don't think this particular problem has appeared on the forum before, but please point me in the right direction if it's a duplicated question!

I have the following dataset and would like to reshape it from long to wide.

ID   variable                   value

1   number of students          1000
1   percentage on financial aid  28
1   acceptance rate              12
1   percentage on financial aid  35
2   number of students          2000
2   percentage on financial aid  1
2   percentage on financial aid  70

Note that the value percentage on financial aid appears twice for each id. I would like to keep just the second occurrence when reshaping from long to wide, since the first occurrence is the rank of the school by the "financial aid" measure, whereas the second occurrence is the actual value.

The variable name percentage on financial aid is exactly the same for both values, so I was wondering whether there's a way to tell R to overwrite the first occurrence with the second. Right now R seems to keep the first occurrence.

As you said, you only want to take the last appearence for dups. Why not filter them first? — mt1022, Jun 23 '17 at 11:03
Ah you're right @mt1022. I forgot I could do fromLast=TRUE and just keep the second instance. Thanks a ton! — gonnastop, Jun 23 '17 at 11:20

score 1 · Answer 1 · answered Jun 23 '17 at 11:17

zz = '
ID   variable                   value
1   number_of_students          1000
1   percentage_on_financial_aid  28
1   acceptance_rate              12
1   percentage_on_financial_aid  35
2   number_of_students          2000
2   percentage_on_financial_aid  1
2   percentage_on_financial_aid  70
'

df <- read.table(text = zz, header = TRUE)


ndf = apply(df, 2, rev)
ndf = as.data.frame(ndf)
nd = reshape(ndf, idvar = "ID", timevar = "variable", direction = "wide")
a = colnames(nd)
b = sub('.*\\.', '', a)
colnames(nd) = b
nd

  ID percentage_on_financial_aid number_of_students acceptance_rate
1  2                          70               2000            <NA>
4  1                          35               1000              12

If you do fromLast = T then:

nd = reshape(df, idvar = "ID", timevar = "variable", direction = "wide")
a = colnames(nd)
b = sub('.*\\.', '', a)
colnames(nd) = b
nd

score 0 · Answer 2 · answered Jun 23 '17 at 11:22

0

I figured this out thanks to the people who commented.

Solution:

df <- subset(df,duplicated(df[,1:2])|!duplicated(df[,1:2],fromLast=TRUE))

answered Jun 23 '17 at 11:22

gonnastop

111
1
1
6

Long to wide with duplicate row names

2 Answers2