How can I use Mathematica and Google scholar to find the number of papers a person published in 2011?
-
3I do not understand the nature of your limitations. In high energy community you can use SPIRES and that will give you the count directly. With google, you would have to filter through concealed duplicates. I think you should make you question more precise. – Sasha May 24 '11 at 12:20
1 Answers
Google Scholar is not very suited for this goal as it doesn't have a formal API AFAIK. It also doesn't provide results in a structured (e.g. XML) format. So, we have to resort to a quick (and very, very fragile!) text pattern matching hack like:
searchGoogleScholarAuthor[author_String] :=
First[StringCases[
Import["http://scholar.google.com/scholar?start=0&num=1&q=" <>
StringDrop[
StringJoin @@ ("author:" <> # <> "+" & /@
StringSplit[author]), -1] <> "&hl=en&as_sdt=1,5"], ___ ~~
"Results" ~~ ___ ~~ "of about" ~~ Shortest[___] ~~
p : Longest[(DigitCharacter | ",") ..] ~~ ___ ~~ "." ~~ ___ ~~
"(" ~~ ___ :> p]]
In[191]:= searchGoogleScholarAuthor["A Einstein"]
Out[191]= "6,400"
In[190]:= searchGoogleScholarAuthor["Einstein"]
Out[190]= "9,400"
In[192]:= searchGoogleScholarAuthor["Wizard"]
Out[192]= "197"
In[193]:= searchGoogleScholarAuthor["Vries"]
Out[193]= "70,700"
Add ToExpression
if you don't like the string result. If you want to restrict the publication years you can add &as_ylo=2011&as_yhi=2011&
to the search string and change the start and end years
appropriately.
Please note that authors with popular names will generate lots of spurious hits as there is no way to uniquely identify a single author. Additionally, Scholar returns a diversity of hits, including citations, books, reprints and more. So, really, this ain't very useful for counting.
A bit of explanation:
Scholar splits the initials and names of authors and co-authors over several author:
fields combined with a +. The StringDrop[StringJoin @@ ("author:" <> # <> "+" & /@ StringSplit[author]), -1]
part of the code takes care of that. The StringDrop
removes the last +
.
The Stringcases
part contains a large text pattern which basically searches for the text that Scholar places at the top of each results page and which contains the number of hits. This number is then isolated and returned.

- 16,122
- 3
- 42
- 94
-
Look there, I've written 197 papers I didn't even know about. Now I need a grant so I can continue the work. – Mr.Wizard May 24 '11 at 17:22
-