R - Creating separate columns from distinct row values

Question

I'm attempting to create a data table with many columns, but cannot think of a way to do this succinctly (using dplyr or something else). Let's consider this data:

URL               TERM 
google.com        dog
yahoo.com         cat
bing.com          hamster
google.com        dog
google.com        cat
yahoo.com         cat
bing.com          dog
yahoo.com         cat

I would like to end with something like this:

URL          dog    cat    hamster
google.com   2      1      0
yahoo.com    0      3      0
bing.com     1      0      1

This is something that I can achieve using for loops... but I might as well not use R. Basically, I'd like to group by URL, create a new column for each unique TERM value, wherein each column contains a count of said TERM for each URL.

Any ideas?

`as.data.frame.matrix(table(df))` – ytk Jun 27 '16 at 19:31 — ytk, Jun 27 '16 at 19:31

score 2 · Accepted Answer · answered Jun 27 '16 at 19:46

2

This can be seen as a problem of reshaping the data frame from long to wide, which can be achieved in a variety of ways in R. For more info check this link.

In your case this can do:

library(reshape2)
dcast(df, URL ~ TERM)

answered Jun 27 '16 at 19:46

thepule

1,721
1
12
22

score 1 · Answer 2 · answered Jun 27 '16 at 20:27

There are actually two operations going on here: (1) aggregating on both URL and TERM to produce a count of each such composite key, and (2) reshaping from long to wide format.

In pure base R, you can use a combination of aggregate() and reshape() to do this:

reshape(aggregate(num~.,cbind(df,num=1L),sum),dir='w',idvar='URL',timevar='TERM');
##          URL num.cat num.dog num.hamster
## 1 google.com       1       2          NA
## 2  yahoo.com       3      NA          NA
## 3   bing.com      NA       1           1

Rudrani Angira · Answer 3 · 2016-06-28T13:42:42.007

A very simple working code . It might not be the best but it is giving the results. I would appreciate some improvement on it. Please find the output below:

     bevs <- data.frame(cbind( col1=c("google.com", "yahoo.com","bing.com","google.com","google.com","yahoo.com","bing.com","yahoo.com") ,col2= c("dog", "cat", "hamster", "dog","cat","cat","dog","cat")))
     bevs
     library(plyr)

     tab<-count(bevs, c("col1", "col2"))
     r=matrix(NA,length(levels(tab$col1)),length(levels(tab$col2)))
     rownames(r)=levels(tab$col1)
     colnames(r)=levels(tab$col2)

     for(i in levels(tab$col1))
     {
       for(j in levels(tab$col2))
       {

                if(length(tab$freq[tab$col1==i&tab$col2==j])==0)
                r[i,j]=0
                else
                r[i,j]=tab$freq[tab$col1==i&tab$col2==j]  
       }

     }

    r

Output:

          cat dog hamster
bing.com     0   1       1
google.com   1   2       0
yahoo.com    3   0       0

Find the code here http://www.r-fiddle.org/#/fiddle?id=BveQws3p&version=10

Can someone please explain what is the problem with my answer. A comment will be helpful.Thanks — Rudrani Angira, Jun 28 '16 at 13:16
Some people may not like the non-brevity of your code (compare it to some other answers here and in the vote-closed link). — Roman Luštrik, Jun 28 '16 at 13:25
Thank you for the feedback .I agree it is not concise. So should I be removing it? — Rudrani Angira, Jun 28 '16 at 13:38

R - Creating separate columns from distinct row values

3 Answers3