0

I'm attempting to create a data table with many columns, but cannot think of a way to do this succinctly (using dplyr or something else). Let's consider this data:

URL               TERM 
google.com        dog
yahoo.com         cat
bing.com          hamster
google.com        dog
google.com        cat
yahoo.com         cat
bing.com          dog
yahoo.com         cat

I would like to end with something like this:

URL          dog    cat    hamster
google.com   2      1      0
yahoo.com    0      3      0
bing.com     1      0      1

This is something that I can achieve using for loops... but I might as well not use R. Basically, I'd like to group by URL, create a new column for each unique TERM value, wherein each column contains a count of said TERM for each URL.

Any ideas?

AmagicalFishy
  • 1,249
  • 1
  • 12
  • 36

3 Answers3

2

This can be seen as a problem of reshaping the data frame from long to wide, which can be achieved in a variety of ways in R. For more info check this link.

In your case this can do:

library(reshape2)
dcast(df, URL ~ TERM)
thepule
  • 1,721
  • 1
  • 12
  • 22
1

There are actually two operations going on here: (1) aggregating on both URL and TERM to produce a count of each such composite key, and (2) reshaping from long to wide format.

In pure base R, you can use a combination of aggregate() and reshape() to do this:

reshape(aggregate(num~.,cbind(df,num=1L),sum),dir='w',idvar='URL',timevar='TERM');
##          URL num.cat num.dog num.hamster
## 1 google.com       1       2          NA
## 2  yahoo.com       3      NA          NA
## 3   bing.com      NA       1           1
bgoldst
  • 34,190
  • 6
  • 38
  • 64
-1

A very simple working code . It might not be the best but it is giving the results. I would appreciate some improvement on it. Please find the output below:

     bevs <- data.frame(cbind( col1=c("google.com", "yahoo.com","bing.com","google.com","google.com","yahoo.com","bing.com","yahoo.com") ,col2= c("dog", "cat", "hamster", "dog","cat","cat","dog","cat")))
     bevs
     library(plyr)

     tab<-count(bevs, c("col1", "col2"))
     r=matrix(NA,length(levels(tab$col1)),length(levels(tab$col2)))
     rownames(r)=levels(tab$col1)
     colnames(r)=levels(tab$col2)

     for(i in levels(tab$col1))
     {
       for(j in levels(tab$col2))
       {

                if(length(tab$freq[tab$col1==i&tab$col2==j])==0)
                r[i,j]=0
                else
                r[i,j]=tab$freq[tab$col1==i&tab$col2==j]  
       }

     }

    r

Output:

          cat dog hamster
bing.com     0   1       1
google.com   1   2       0
yahoo.com    3   0       0

Find the code here http://www.r-fiddle.org/#/fiddle?id=BveQws3p&version=10

Rudrani Angira
  • 956
  • 2
  • 14
  • 28