26

I was wondering the other day if StackOverflow had an API I could access from Mathematica, and apparently it does: "Saving plot annotations"

What's the best way to get data from StackOverflow into Mathematica? Sjoerd used the information to make a plot. I'm interested in adding SO-related notifications into a docked cell I keep in my notebooks, so I can tell when there are updates or responses without leaving Mathematica.

Community
  • 1
  • 1
Brett Champion
  • 8,497
  • 1
  • 27
  • 44

2 Answers2

27

By popular demand, the code to generate the top-10 SO answerers plot (except annotations) using the SO API (it's a pretty neat and complete API; lots of goodies there. Easy too - see my code).

Update: added App-key to ensure the code co-operates better with the SO-API (higher daily call cap). Please use it only for this app.

April 2011 enter image description here

August 2011 enter image description here

MMA 8 version! MMA7 version further down

getRepChanges[userID_Integer] :=
 Module[{totalChanges},
  totalChanges = 
   "total" /. 
    Import["http://api.stackoverflow.com/1.1/users/" <> 
      ToString[userID] <> "/reputation?key=NgVJ4Y6vFkuF-oqI-eOvOw&fromdate=0&pagesize=1&page=1",
      "JSON"
    ];
    Join @@ 
    Table[
      "rep_changes" /. 
         Import["http://api.stackoverflow.com/1.1/users/" <> 
                ToString[userID] <> 
                "/reputation?key=NgVJ4Y6vFkuF-oqI-eOvOw&fromdate=0&pagesize=100&page=" 
                <> ToString[page], 
                "JSON"
         ],
         {page, 1, Ceiling[totalChanges/100]}
    ]
  ]

topAnswerers = 
  ({"display_name","user_id", "email_hash"} /. #) & /@ 
     ("user" /. 
      ("top_users" /. 
        Import[
          "http://api.stackoverflow.com/1.1/tags/mathematica/top-answerers/all-time",    
          "JSON"
        ]
       )
      )

topAnswerers = {#, #2, 
    Import["http://www.gravatar.com/avatar/" <> #3 <> ".jpg?s=36&d=identicon&d=identicon"]
    } & @@@ topAnswerers

repChangesTopUsers =
  Table[
    repChange = 
     ReleaseHold[
        (
         Hold[
           {
              DateList["on_date" + AbsoluteTime["January 1, 1970"]], 
             "positive_rep" - "negative_rep"
           }
         ] /. #
        ) & /@ getRepChanges[userID]
      ] // Sort;
      accRepChange = {repChange[[All, 1]],Accumulate[repChange[[All, 2]]]}\[Transpose],
      {userID, topAnswerers[[All, 2]]}
    ];

pl = DateListLogPlot[
  Tooltip @@@ 
   Take[({repChangesTopUsers, Row /@ topAnswerers[[All, {3, 1}]]}\[Transpose]), 
    10], Joined -> True, Mesh -> None, ImageSize -> 1000, 
  PlotRange -> {All, {10, All}}, 
  BaseStyle -> {FontFamily -> "Arial-Bold", FontSize -> 16}, 
  DateTicksFormat -> {"MonthNameShort", " ", "Year"}, 
  GridLines -> {True, None}, 
  FrameLabel -> (Style[#, FontSize -> 18] & /@ {"Date", "Reputation", 
      "Top-10 answerers", ""})]

EDIT
Note that you can plot up to and including a top-20 by changing the value in the Take function. It gets busy pretty soon.

Tried to improve the readability of Markup code somewhat. I'm afraid this will yield some spurious spaces when copied.

EDIT
Page size back to 100 elements/page ==> fewer API calls Please note that the first call to the API is to determine the amount of posts the user has. This data is present no matter the page size, so this is preferably chosen small (10 or so, possibly 1, didn't check). Then the data is fetched in successive pages until the last page is reached. You can use the maximum page size (100) for that. Just take care that the maximum number of pages in the loop count is adjusted accordingly.

EDIT: better MMA 7 code (Fri Apr 22)

MMA 7 doesn't do JSON imports, so I do a text import instead followed by a bare-bones JSON translation. I've tested this version several times now (in MMA 8) and it seems to work without the errors I got yesterday.

getRepChanges[userID_Integer] :=
 Module[{totalChanges},
  totalChanges = 
   "total" /. 
    ImportString[
     StringReplace[(Import[
        "http://api.stackoverflow.com/1.1/users/" <> 
         ToString[userID] <> 
         "/reputation?key=NgVJ4Y6vFkuF-oqI-eOvOw&fromdate=0&pagesize=1&page=1", "Text"]), {":" ->
         "->", "[" -> "{", "]" -> "}"}], "NB"];
  Join @@ 
   Table["rep_changes" /. 
     ImportString[
      StringReplace[
       Import["http://api.stackoverflow.com/1.1/users/" <> 
         ToString[userID] <> 
         "/reputation?key=NgVJ4Y6vFkuF-oqI-eOvOw&fromdate=0&pagesize=100&page=" <> ToString[page],
         "Text"], {":" -> "->", "[" -> "{", "]" -> "}"}], 
      "NB"], {page, 1, Ceiling[totalChanges/100]}]]
topAnswerers = ({"display_name", "user_id", 
      "email_hash"} /. #) & /@ ("user" /. ("top_users" /. 
      ImportString[
       StringReplace[
        " " <> Import[
          "http://api.stackoverflow.com/1.1/tags/mathematica/top-answerers/all-time", "Text"], {":" -> "->", "[" -> "{", "]" -> "}"}], 
       "NB"]))
topAnswerers = {#, #2, 
    Import["http://www.gravatar.com/avatar/" <> #3 <> 
      ".jpg?s=36&d=identicon&d=identicon"]} & @@@ topAnswerers
repChangesTopUsers = 
  Table[repChange = 
    ReleaseHold[(Hold[{DateList[
             "on_date" + AbsoluteTime["January 1, 1970"]], 
            "positive_rep" - "negative_rep"}] /. #) & /@ 
       getRepChanges[userID]] // Sort;
   accRepChange = {repChange[[All, 1]], 
      Accumulate[repChange[[All, 2]]]}\[Transpose], {userID, 
    topAnswerers[[All, 2]]}];

DateListLogPlot[
 Tooltip @@@ 
  Take[({repChangesTopUsers, 
      Row /@ topAnswerers[[All, {3, 1}]]}\[Transpose]), 10], 
 Joined -> True, Mesh -> None, ImageSize -> 1000, 
 PlotRange -> {All, {10, All}}, 
 BaseStyle -> {FontFamily -> "Arial-Bold", FontSize -> 16}, 
 DateTicksFormat -> {"MonthNameShort", " ", "Year"}, 
 GridLines -> {True, None}, 
 FrameLabel -> (Style[#, FontSize -> 18] & /@ {"Date", "Reputation", 
     "Top-10 answerers", ""})] 

EDIT: auxiliary functions to filter on post tags These functions can be used to filter reputation gains, in order to find gains for certain tags only. tagLookup gets a post_ID integer as input and yields the specific post's tags. getQuestionIDs and getAnswerIDsFrom... go the other way. Given a tag they find all the question and answer IDs so that one can test with MemberQ whether a given post_ID belongs to this tag. Both tagLookup and getAnswerIDs are slow since many API calls are necessary. I couldn't test the last two function as either API access is down or my IP has been capped.

tagLookup[postID_Integer] :=
 Module[{im},
  im = Import["http://api.stackoverflow.com/1.1/questions/" <> ToString[postID],"JSON"];
  If[("questions" /. im) != {},
   First[("tags" /. ("questions" /. im))],
   im = Import["http://api.stackoverflow.com/1.1/answers/" <> ToString[postID],"JSON"];
   First[("tags" /. ("questions" /. Import["http://api.stackoverflow.com/1.1/questions/" <> 
          ToString[First["question_id" /. ("answers" /. im)]], "JSON"]))]
   ]
  ]

getQuestionIDs[tagName_String] := Module[{total},
  total = 
   "total" /. 
    Import["http://api.stackoverflow.com/1.1/questions?tagged=" <> 
      tagName <> "&pagesize=1", "JSON"];
  Join @@ 
   Table[("question_id" /. ("questions" /. 
        Import["http://api.stackoverflow.com/1.1/questions?key=NgVJ4Y6vFkuF-oqI-eOvOw&tagged=" <>
           tagName <> "&pagesize=100&page=" <> ToString[i], 
         "JSON"])), {i, 1, Ceiling[total/100]}]
  ]

getAnswerIDsFromQuestionID[questionID_Integer] :=
 Module[{total},
  total = 
   Import["http://api.stackoverflow.com/1.1/questions/" <> 
     ToString[questionID] <> "/answers?key=NgVJ4Y6vFkuF-oqI-eOvOw&pagesize=1", "JSON"];
  If[total === $Failed, Return[$Failed], total = "total" /. total]; 
  Join @@ Table[
    "answer_id" /. ("answers" /. 
       Import["http://api.stackoverflow.com/1.1/questions/" <> 
         ToString[questionID] <> "/answers?key=NgVJ4Y6vFkuF-oqI-eOvOw&pagesize=100&page=" <> 
         ToString[i], "JSON"]), {i, 1, Ceiling[total/100]}]
  ]

getAnswerIDsFromTag[tagName_String] :=
 Module[{},
  Join @@ (getAnswerIDsFromQuestionID /@ 
     Cases[getQuestionIDs[tagName], Except[$Failed]])
  ]
Community
  • 1
  • 1
Sjoerd C. de Vries
  • 16,122
  • 3
  • 42
  • 94
  • Anyone care to remove the other code instances? Apparently I can't do that. – Sjoerd C. de Vries Apr 21 '11 at 14:55
  • FWIW I noticed dreeves got negative reputation. This probably related to bounties or so. I guess this shouldn't be possible, but might be due to previous reputation formula changes. – Sjoerd C. de Vries Apr 21 '11 at 15:03
  • Sjoerd, that graph makes the two of us look rather obsessed! – Mr.Wizard Apr 21 '11 at 16:43
  • I updated the code to add gravatars to the tooltips on the curves. – Brett Champion Apr 21 '11 at 16:51
  • Mr.Wizard Obsession: yeah, we took off together it seems, though I feel a bit left in the dust now ;-) MMA 7: Is it the JSON Import? The docs don't say when that was enabled. – Sjoerd C. de Vries Apr 21 '11 at 18:50
  • @Brett Nice, better than boring names only. Now a few more of the guys here need to pick less boring gravatars. – Sjoerd C. de Vries Apr 21 '11 at 18:52
  • 2
    @Mr.Wizard JSON import is new in Mathematica V8. This might be useful for V7 users http://stackoverflow.com/questions/2633003/parsing-and-generating-json (includes code and a link to at least one other implemtation.) – Brett Champion Apr 21 '11 at 19:06
  • @Sjoerd You forgot one of the pagesizes – Dr. belisarius Apr 21 '11 at 19:36
  • @Mr.Wizard The following seems to work for me. Could you test it on mma7? `ImportString[ StringReplace[ Import["http://api.stackoverflow.com/1.1/users/615464/reputation?\ fromdate=0"], {":" -> "->", "[" -> "{", "]" -> "}"}], "NB"]`. Please remove the backslash after reputation? – Sjoerd C. de Vries Apr 21 '11 at 19:41
  • @belisarius Already done, or did I miss another one? Working in Markup is no fun. – Sjoerd C. de Vries Apr 21 '11 at 20:19
  • @Mr.Wizard Please see my latest edit. I replaced the JSON Import part with the stuff in my comment above. Could you test it? It seems to work here, although I once had the impression that I got hit by the API cap Brett was talking about. – Sjoerd C. de Vries Apr 21 '11 at 20:49
  • I tried running the latest version and I got a mess of error messages and no plot. I will come back to it when I have the time, and try to fix it. – Mr.Wizard Apr 21 '11 at 21:36
  • @Mr.Wizard It looks like a server issue. Sometimes, apparently the last page isn't served and the Import returns `Null` to which the program doesn't react nice. May be a time-out or an API usage cap. – Sjoerd C. de Vries Apr 21 '11 at 22:18
  • @Mr.Wizard I made several changes which didn't appear to work. Finally, I discovered that I got errors even on a standard import, but only on the last page. Changing the import type to "Text" removed the error. So, if you would be so kind as to test the newest version on MMA7 I would be obliged. – Sjoerd C. de Vries Apr 22 '11 at 07:19
  • 1
    Thanks. Don't try to figure out what time zone I am on, it will only confound you. – Mr.Wizard Apr 22 '11 at 20:47
  • 1
    @Sjoerd Applying your lessons: http://meta.stackexchange.com/questions/88673/how-many-users-got-the-x-badge-before-i-did/88680#88680 :) – Dr. belisarius Apr 25 '11 at 02:48
  • @belisarius Nice. The next step would be to run SO from within Mathematica. – Sjoerd C. de Vries Apr 25 '11 at 20:30
  • Sjoerd, is it possible to create a chart that shows upvotes on the mathematica tag rather than total rep? I suppose this could be answered by reading about the API, but I am feeling lazy. – Mr.Wizard Apr 26 '11 at 08:29
  • 1
    @Mr.Wizard I added a few auxiliary functions. It seemed trivial, but I encountered several difficulties. I have not been able to test everything, cause I seem to be API capped again. The additions are in JSON format as before. The above mentioned bare-bones JSON import for mma 7 should work for you. – Sjoerd C. de Vries Apr 26 '11 at 13:45
  • 1
    @Mr.Wizard The API is again up and running now. The functions find 726 questions tagged mathematica and 1707 answers. So there's a pool of 2433 mathematica-related posts that can be used to filter reputation. It's not something you want to do often as it's all pretty slow. To get the answer ids 726 individual calls to the API are necessary. This probably triggers an IP cap when you do it too often. It took 7 min on my PC. – Sjoerd C. de Vries Apr 26 '11 at 15:43
  • @Mr.Wizard BTW did you see my double bracket parsing response? – Sjoerd C. de Vries Apr 26 '11 at 20:45
  • I just skimmed the answers, but I am going to go through all of them soon and compare. Maybe I'll post a nice Graph a la Sjoerd. – Mr.Wizard Apr 26 '11 at 21:03
  • @Mr.Wiz Done. We're getting too close together. Have to think about a new format, perhaps rankings only. – Sjoerd C. de Vries Aug 16 '11 at 17:36
12

Brett, unrelated to SO API, but you could use RSS feed for the newest Mathematica-tagged questions. Here is my naive implementation:

QuestionHyperlink[data_] := 
 Function[{name, title, link}, 
   Hyperlink[Tooltip[title, name], link]] @@ Join[
   Cases[data, 
    XMLElement[
      "author", _, {___, XMLElement["name", {}, {name_}], ___}] :> 
     name],
   Cases[data, XMLElement["title", _, {title_}] :> title],
   Cases[data, XMLElement["link", rules_, {}] :> ("href" /. rules)]]

Cases[Import[
  "http://stackoverflow.com/feeds/tag?tagnames=mathematica&sort=\
newest", "XML"], 
 XMLElement["entry", attrs_, data_] :> 
  QuestionHyperlink[data], Infinity]

enter image description here

Sasha
  • 5,935
  • 1
  • 25
  • 33
  • 1
    I recommend to use the following URL for RSS with all Wolfram-related questions: http://stackoverflow.com/feeds/tag/mathematica+or+wolfram+or+wolframalpha+or+mathematica-frontend+or+mathematica-8 – Alexey Popkov Apr 22 '11 at 00:46
  • @Sasha This is exactly the kind of thing I waste time doing. :) +1 for rolling your own feed. – telefunkenvf14 Apr 22 '11 at 17:54
  • @Alexey Given the relatively low influx of mma questions here and the horde of answerers waiting to devour anything that's thrown in their midst, I'd love to have a solution that's instantaneous instead of based on polling. – Sjoerd C. de Vries Apr 23 '11 at 12:24
  • @Sjoerd What do you mean? Is it related to [my handy RSS link](http://stackoverflow.com/feeds/tag/mathematica+or+wolfram+or+wolframalpha+or+mathematica-frontend+or+mathematica-8)? – Alexey Popkov Apr 23 '11 at 13:59
  • 1
    @Alexey My problem is, I have an RSS reader on my cellphone that I use like every other night to read about 20 feeds, but in the case of SO I would need to be pinged as soon as something arrives, otherwise the juicy question has already been thoroughly answered. – Sjoerd C. de Vries Apr 23 '11 at 17:44
  • @Sjoerd Now I understand. Sasha's solution is instantaneous but it can not be run on the cellphone. And it does not highlight new and unwatched questions. Probably the latter can be done by means of SO API. – Alexey Popkov Apr 23 '11 at 22:27