1

I'm using Michael Manoochehri's example (http://stackoverflow.com/a/10969900/1387380) to pipe data out of DataStore to Google Cloud Storage using Pipeline and Mapreduce APIs but my jobs are running forever and never complete. I have some jobs running for the past 7 days which I can't even stop from the MapperPipeline console interface.

How can I stop them manually or programmatically?

Charles
  • 580
  • 5
  • 16
  • Hi Charles: First off, you can remove those lingering jobs by purging your Task Queue (probably the "default" queue) at Admin console->Task Queues->[default]->Purge). As for the long running jobs issue, to help me debug, how many Datastore entities are you mapping over? Can you look in your App Engine error logs and see if there are any issues involving writing the results to Cloud Storage (https://appengine.google.com/logs?app_id=YOUR_APP_ID)? – Michael Manoochehri Jul 18 '12 at 18:25
  • Hi Michael, I've purged the tasks queue already but the jobs keep showing up as running in the MapReduce dashboard. We are talking about very few entities (30 or less) and yes I had issues writing to Cloud Storage, I had a Permission Denied error on GS file creation which I've fixed by now but those jobs are still running somewhere I guess. I was wondering how to kill them for good since it's slowly filling up my quotas. Thanks for your help. – Charles Jul 18 '12 at 22:33
  • Ok - this behavior might be due to a previous of your app still running - it this possible? – Michael Manoochehri Jul 19 '12 at 01:47
  • I guess yes, the app was still running until I run it again with no bug this time. The previous jobs are still marked as running on the MarReduce DashBoard and I can't do nothing to clean them up but according to my quota details and logs pages, no process are begin executed so it's all good at the end. – Charles Jul 19 '12 at 08:02

1 Answers1

1

I think that this behavior is due to a bug in how the current version of the App Engine MapReduce lib handles Cloud Storage output writer errors. If this happens, as I mention above, check out the GAE logs for permission or API errors involving Cloud storage (or whichever output writer you are currently using).

There should be improvements in our next iteration of the library, but currently if there are issues like this, the quick workaround is to purge your task queue, correct problem causing the errors, and kick off the pipeline again.

Michael Manoochehri
  • 7,931
  • 6
  • 33
  • 47