2

I'm hoping to find a corpus of play-by-play style commentary* for an NLP project that involves predicting game outcomes from such commentary.

I can't shake the feeling that with the huge interest in chess within the A.I. community, there must've been some previous project involving using chess commentary for some similar purpose, but for I can't find one for the life of me.

I have found a few sites like Chess Games that claim to have written commentary for some of their games, but most don't and there appears to be no way to sort them by this property.

*By 'play-by-play style commentary' I mean anything involving the game at hand and nothing more. E.g. everything from "Kasparov moved his queen to b3, taking Deep Blue's pawn" to "Kasparov's poor opening has left his Knight vulnerable" but not things like "Kasparov played a similar move back in his 1996 game" or "Kasparov's hair looks particularly pretty today".

zergylord
  • 4,368
  • 5
  • 38
  • 60
  • Could those who've voted to close this please explain how my question is off topic -- I'll try and rephrase it in a more SO friendly way if at all possible. – zergylord Sep 02 '11 at 09:14
  • try [boardgames.stackexchange.com](http://boardgames.stackexchange.com) – BlueRaja - Danny Pflughoeft Sep 02 '11 at 12:28
  • That site doesn't have anything vaguely related to my question about accessing chess commentary programmatically -- This question is about finding a corpus, not about the game of chess. Why is that off-topic when dozens of other questions about searching for corpora are fine? – zergylord Sep 02 '11 at 17:40
  • You didn't ask how to access such-and-such's chess commentary programatically, you asked where to find chess commentary. Why you need it is irrelevant, that's completely off-topic here. Try the boardgame SE. – BlueRaja - Danny Pflughoeft Sep 02 '11 at 18:03
  • Commentary that would be in a machine readable format preferably with syntactic tagging -- implied by using the term 'corpus' and stating my intent. Again, such requests for corpora are common on this site; are they all off-topic? My feeling is that if I had just asked for any old commonly used corpus, you wouldn't have a problem, but my *additional constraint* of it being about chess somehow has distracted you from its primary purpose. – zergylord Sep 02 '11 at 19:23
  • Compare to these on-topic questions: http://stackoverflow.com/q/1892802/821806, http://stackoverflow.com/questions/137380/nlp-building-small-corpora-or-where-to-get-lots-of-not-too-specialized-engli – zergylord Sep 02 '11 at 19:25
  • I agree that this is a valid question. I searched a bit for any annotated matches but did not find much. It might be worth posting on a chess forum- if you can get the transcripts, you can manually split them up with a few hours of work (as much as that sucks), and that should give useful data. Best I can offer is http://www.chesslab.com/0799/commentary.htm – nflacco Sep 04 '11 at 04:56
  • You might also add to the Area 51 proposal: http://area51.stackexchange.com/proposals/7551/chess – Iterator Sep 05 '11 at 03:11
  • You might include [analysis] in your search (i.e. not just commentary). In addition, as most such material is copyrighted, it may not be easy to find an aggregate corpus, but you might find material via contacting major publishers or in the chess sections of digitally accessible newspapers, such as the NYT or some British papers. – Iterator Sep 05 '11 at 03:13
  • I'm a grandmaster and I think you can't do it to any free sites. They wouldn't let you anyway. You need to purchase volume of huge database with comments from Chessbase. You can't complete the project without paying. – ABCD Sep 05 '11 at 03:16
  • Wow, thanks for all the helpful comments guys! @Kinderschocolate I'm definitely willing to pay, so that sounds like the solution I was looking for. If this question ever gets reopened, post it as an answer and I'll accept it :) – zergylord Sep 05 '11 at 20:17
  • You know another thing you could do is get a list of top AI researchers, say look for all AI/NLP/Stat/Machine Learning publications in the last 10 years, and then compare that to a list of ranked chess players. The intersection of those two groups could probably help out. – nflacco Sep 08 '11 at 05:16

0 Answers0