How to capture and process live activity from another application in Python?

Question

I'm a computer science student, and as a personal project, I'm interested in building software that can watch and produce useful information about Super Nintendo games being run on a local emulator. This might be things like current health, current score, etc., (anything legible on the screen). The emulator runs in windowed form (I'm using SNES9x) and so I wouldn't need to capture every pixel on the screen, and I'd only have to capture about 30fps.

I've looked into some libraries like FFMPEG and OpenCV, but so far what I've seen leads me to believe I have to have pre-recorded renderings of the game.

At some point, I'd like to explore the capacity for developing a somewhat heuristic AI that might be able to play Super Metroid, but to do so, it would need to be interpreting live gameplay. The algorithms and data structures needed for something like this are within my realms of study; video processing is not, and I'm something of a noob. Any pointers would be awesome (pardon the lame computer science pun).

For those who might point out that it would be simpler to scrape the game memory rather than screen grab data -- yes, it would be. My interest is in developing something that is only given the information a human player would have, i.e., the visuals on the screen, so this is the approach I'm interested in for the time being. Thanks!

As a side note, you _still_ might want to scrape the emulator's memory (or patch the emulator) to delivery you a frame of graphics each time the frame buffer flips, instead of trying to guess what frequency to watch at, or maybe even to grab the video output at a lower level. (Unless you're really trying to model human vision, which it doesn't sound like you are.) — abarnert, May 08 '15 at 23:23
Meanwhile, what makes you think you need pre-recorded renderings? By the time you're feeding a stream into FFMPEG, it has no idea where that stream came from. (If you're also asking it to parse the stream as a _file_, rather than a transport stream or just a pipe of raw frame data, then it obviously makes a difference, but just don't do that.) — abarnert, May 08 '15 at 23:25
But finally, I think this is just way too broad for StackOverflow. Unless you're just looking for recommendations for libraries (which is off-topic for a different reason), you're looking for pointers on how to do something very general. That's certainly a good question, but it's not a question StackOverflow can give answer well. (The help explains why.) You probably want something more discussion-oriented, like a mailing list or forum. — abarnert, May 08 '15 at 23:26
All perfectly valid points -- if there's a simpler way to get each frame of graphics, I'm all for it. And yes, I'm not looking for a complete "how do I make this" explanation so much as "does anyone have a library they suggest for getting the frame data?" If there was something like: "import framegetter", then "frame = framegetter.getframe" I'd be so happy. EDIT: I've tried formatting the last part of this post to not look like a jumbled mess, but to no avail. — growling_egg, May 08 '15 at 23:31
OK, if it really is a library-shopping question, I voted to close for the wrong reason… but it's still off-topic for StackOverflow. There's even a standard close reason specifically for that class of problems. I think this question could easily be turned into one that describes the problem exactly as that close reason suggests it should… but then it would be too broad for StackOverflow, for the reasons I explained in my last comment. — abarnert, May 08 '15 at 23:34
PS, if by "the last part of this post" you mean the comment… yeah, comments at SO throw away a lot of formatting, including newlines and any other run of whitespace, and they don't have a WYSIWYG preview, so there are a lot of things you just can't put in a comment. But one thing you _can_ do is use backticks for short code fragments like \`frame = framegetter.getframe()\` for `frame = framegetter.getframe()`. — abarnert, May 08 '15 at 23:36
I dont think this will be a very sucessfull path for creating ai agents to play metroid ... parsing the screen image will be far too slow I think ... you need to find a game that provides hooks to be able to inspect the objects (hex decoding files... or better yet something like python-mario in which you can interact directly) — Joran Beasley, May 08 '15 at 23:49
possible duplicate? http://stackoverflow.com/q/24129253/541038 — Joran Beasley, May 08 '15 at 23:51
You may want to **read whathaveyoutried.com & show some respect** to the StackOverflow Community, which strongly encourages to post high quality questions, altogether with a MCVE ( **a Minimum-Complete-Verifiable-Example of code ) showing what-you-have-tried so far**. You may want to update your post, so as to meet this minimum reasonable level of quality & to show your will to respect other StackOverflow contributing members. They are professionals who love to answer good questions on MCVE-related issues. **Enjoy being StackOverflow Contributing Member & do support this Community Netiquette** — user3666197, May 10 '15 at 16:28

user3666197 · Accepted Answer · 2015-05-10T16:24:44.677

A: Yes, `python` can grab & process any scene via a USB-input device

The real-time image ( not stream ... ) processing design issues are about the overall RT-loop performance, mainly the image-transformations & processing, not about just the static image-size and an acquisition method per-se.

Anyway, your code has to be carefully designed and pre-measured in [usec, nsec] ( yes, there are python tools available to allow you to benchmark your code's timing issues down to some 25-nsec resolution ) so as to keep the whole RT-loop feasible within your general image-processing architecture. Plus you will struggle with both resouces management & error-handling, both of which cause a lot of problems in RT-scheduling.

How? Take this as an inspiration to start from

A sample brought just for an initial image-capture idea from a medical imaging PoC python prototype:

def demoCLUT( ):
    cameraCapture = cv2.VideoCapture(0)

    cv2.namedWindow(        'msLIB:ComputerVision.IN' )
    cv2.setMouseCallback(   'msLIB:ComputerVision.IN', onMouse )

    cv2.namedWindow(        'msLIB:ComputerVision.OUT-0' )
    cv2.namedWindow(        'msLIB:ComputerVision.OUT-1' )
    cv2.namedWindow(        'msLIB:ComputerVision.OUT-2' )

    success, frame = cameraCapture.read()

    if success:

        while success and cv2.waitKey( 10 ) == -1 and not clicked:          # [msec]

            aGrayFRAME  =                               cv2.cvtColor(   frame, cv2.COLOR_BGR2GRAY )

            cv2.imshow( 'msLIB:ComputerVision.IN',                                     frame )
            cv2.imshow( 'msLIB:ComputerVision.OUT-0',                             aGrayFRAME )
            cv2.imshow( 'msLIB:ComputerVision.OUT-1',   reprocessIntoFalseCOLORs( aGrayFRAME, frame, aFalseCLUT   ) )    # <frame>-destructive
            cv2.imshow( 'msLIB:ComputerVision.OUT-2',   reprocessIntoFalseCOLORs( aGrayFRAME, frame, aFalseCLUT_T ) )    # <frame>-destructive

            success, frame = cameraCapture.read()
        pass
    else:
        print "OpenCV.CLUT.DEMO: cameraCapture.read() failed to serve a success/frame ... "
    pass
    # ------------------------------------------------------------------<RELEASE-a-Resource>
    cameraCapture = False                                               #RELEASE-a-Resource setting it asFalse
    print 30 * ">", "call clearWIN() to release & tidy up resources..."
    # ------------------------------------------------------------------<RELEASE-a-Resource>

Are pre-recorded sequences a must or a nice-to have?

As far as your motivation was expressed, your prototype will use a lot of time for development. There the pre-recorded sequences may help you focus on dev/test side, while your concentration is not split in halves between the game and the python-code, however these are not a must-have.

A remark on FPS. You build AI against a Human-Player

Having said this, your initial AI-engine may start at anything low as 10-15 FPS, no need to get yourself into an unsolvable RT-loop puzzle just due to artificially high FPS rate.

Our human eye / brain tandem gets an illusion of motion somewhere near the TV refresh-rate ( meaning the analog-TV original, where about 21-half-screens were for many decades enough for people ( not the same for dogs ... thus the marketing companies focused on rather influencing humans, measuring their advertising campaings' impact with people-meters and not dog-meters as our best friends did not like at all to watch those strange flashing statics on TV-screens ) ).

So do not over-design the AI-engine to be developped, it shall aim at beating Human-Players, not the dog ones, shan't it?

How to capture and process live activity from another application in Python?

1 Answers1

A: Yes, `python` can grab & process any scene via a USB-input device

How? Take this as an inspiration to start from

Are pre-recorded sequences a must or a nice-to have?

A remark on FPS. You build AI against a Human-Player

Linked

How to capture and process live activity from another application in Python?

1 Answers1

A: Yes, python can grab & process any scene via a USB-input device

How? Take this as an inspiration to start from

Are pre-recorded sequences a must or a nice-to have?

A remark on FPS. You build AI against a Human-Player

Linked

A: Yes, `python` can grab & process any scene via a USB-input device