r/sql for an operation between two data frames

Question

I have two data set

merge_paper_author:

author_id   paper_id
 731         136
 731         198
 778         241
 ........

paper_author:

paper_id  author_id
 136        731
 136        726
 136        728
 138        731
 138        726
 138        777
 138        798
 140        728
 140        726
 140        731

In the merge_paper_author for paper_id 136 author_id is 731.Now i need to take this value of paper_id=136 and author_id=731 and look into the paper_author table.For paper_id=136 i have 3 authors,i.e author_id=731,726 and 728.For this I will have 2 combinations of author_ids(considering 731 fixed)- 731, 726 and 731,728.

Now I need to look this combination across all papers in paper_author and find such combination and count them.For eg- for paper_id=138 i have 3 author_ids=731,777,726.In this there is 1 count of 731 and 726.In paper_id=140 there is combination of author_id=731,728 and 726 so count is 2 in this case(731,726 and 731 and 728).So total count is 2+1=3.i also need the no of times such occurences occur.

This i need to do for every such author_id , paper_id combination in merge_paper_author.I have written a code but its highly time consuming.I can do with R or sql query.My r code is:

count_multiple_sin_pap<-vector()
co_authored_comb_sin_pap<-vector()
 no_coauthors_comb<-vector()
data_papers<-merge_paper_author$paper_id
data_authors_sin<-merge_paper_author$author_id
data_i<-vector()
for(i in 1:length(data_papers))
{
 index<-which(paper_author$paper_id%in%data_papers[i])
 if(length(index)>1){
    authors<-paper_author$author_id[index]
    if(length(unique(authors))==1){data_i<-rbind(data_i,i)}
    else{
    index1<-which(paper_author$author_id%in%data_authors_sin[i])
    author<-paper_author$author_id[index1]
    index2<-which(authors==data_authors_sin[i])
    authors<-authors[-index2]
    index3<-which(paper_author$author_id%in%data_authors_sin[i])
    index4<-which(paper_author$paper_id%in%paper_author$paper_id[index3])
    index5<-which(paper_author$author_id[index4]%in%authors)
    dat<-paper_author$author_id[index4][index5]
    dat<-data.frame(dat)
    agg<-aggregate(list(numdup=rep(1,nrow(dat))), dat, length)
    count_multiple_sin_pap[i]<-length(which(agg$numdup>1))
    co_authored_comb_sin_pap[i]<-sum(agg$numdup)
    no_coauthors_comb[i]<-nrow(agg)
    }}else{count_multiple_sin_pap[i]<-0
      co_authored_comb_sin_pap[i]<-0
      no_coauthors_comb[i]<-0
     }
    }

In the above code I have taken two inputs data_papers and data_authors_sin

dput(data_papers):
 1696062L, 1696062L, 1701471L, 1701471L, 1707481L, 1707481L, 1722811L, 
 1722811L, 1754307L, 1754307L, 1768287L, 1768287L, 1791496L, 1791496L, 
 1795108L, 1795108L, 1816428L, 1816428L, 1844081L, 1844081L, 1850488L, 
 1856769L, 1856769L, 1931620L, 1931620L, 1947922L, 1955753L, 1955753L, 


 dput(data_authors_sin):
 2292903L, 2292903L, 2292903L, 2292903L, 2292903L, 2292903L, 2292903L, 
 2292903L, 2292903L, 2292903L, 2292903L, 2292903L, 2292903L, 2292903L, 
 2292903L, 2292903L, 2292903L, 2292903L, 2292903L, 2292903L, 2292903L, 
 2292903L, 2292903L, 2292903L, 2292903L, 2292903L, 2292903L, 2292903L,

Thanks

Please provide the result of `dput(...)` for each of your inputs and double check that running the code you provide on those inputs does provide meaningful output. — G. Grothendieck, Apr 07 '14 at 16:08
@G. Grothendieck yes its providing the result through co_authored_comb_sin_pap[i]<-sum(agg$numdup) — user3171906, Apr 07 '14 at 16:27
That is not the complete dput out. Please read this: http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example — G. Grothendieck, Apr 07 '14 at 17:43

r/sql for an operation between two data frames

0 Answers0