7

We've created batches of HITs using the Mechanical Turk web interface. Now all we want to do is download the results for a batch using the API, the same way you can download the results for a batch in the web interface using "Download CSV".

The documentation from Amazon says that downloading the results from the API is possible and I would be surprised if it isn't. But after a lot of programming hours and testing I have not been able to get the results of a batch.

http://docs.aws.amazon.com/AWSMechTurk/latest/AWSMturkAPI/ApiReference_OperationsArticle.html

Our problem is not to get the HIT data, that stuff is easy with GetHIT. Our problem isn't either to get the assignment data, that's easily done with GetAssignmentsForHIT. Our problem is to figure out the HIT IDs of a batch so that we only fetch the results of that batch.

We thought we would be able to do this with GetHITsForQualificationType but since we use the same HIT type ID for all batches this isn't possible. The only other operation I can see is SearchHITs, but this operation only lets you "sort" values and not "filter" by e.g batch ID.

If Amazon is a SOA company and they follow the "eat your own dog food" concept, then I wonder how they generate the results in "Download CSV" using their API?

Any hints would be greatly appreciated. Thank you!

UPDATE #1

I believe you could use SearchHITs to pull out all HITs. Then grab the details for each HIT using GetHIT. Then filter all the HITs by "RequesterAnnotation" which actually contains the batch ID, e.g "BatchId:1234567;". This might be the only solution. Sounds a bit far fetched though.

user1493124
  • 199
  • 1
  • 2
  • 9
  • What is "Batch ID"? I've never seen this term in the API documentation. – Fredrick Brennan Apr 19 '13 at 18:08
  • 1
    There's probably no such thing, but when you create HITs using the web interface using a CSV file Amazon refers to those with the wording "batches". A batch consists of multiple HITs. The batch ID is nothing they promote, but you can see it in the URL when browsing the batches. – user1493124 Apr 19 '13 at 23:32

1 Answers1

1

The workflow is exactly as you describe in your Update #1: (1) Use SearchHITs to get all of your HITs. (2) Get details with GetHIT (You can actually skip this step because the "Requester Annotation" field comes with SearchHITs if you include the HITDetail response group). (3) Filter the results by the annotation field to get the HITs you want. (4) Use GetAssignmentsForHIT to retrieve assignments.

The "batch id" is something that appears to only be accessible to Amazon for use on the Requester User Interface. (see some discussion on the MTurk Developer Forum)

And, of course, the API is going to give you results in XML, which you'll need to parse to turn them into a CSV.

Thomas
  • 43,637
  • 12
  • 109
  • 140