5

I'm working on an emulator as a side/fun project, but I'm having some performance issues and failing to figure out where they are coming from.

The application is mainly composed of a GLKView for display, and a separate thread with an infinite loop for the cpu emulation. Here's a sample with all the actual emulation code taken out that still displays the problem:

@implementation ViewController

- (void)viewDidLoad {
    [super viewDidLoad];

    GLKView *glView = [[GLKView alloc] initWithFrame:self.view.bounds];
    glView.delegate = self;
    glView.context = [[EAGLContext alloc] initWithAPI:kEAGLRenderingAPIOpenGLES2];
    [EAGLContext setCurrentContext:glView.context];
    [self.view addSubview:glView];
    glView.enableSetNeedsDisplay = NO;
    CADisplayLink* displayLink = [CADisplayLink displayLinkWithTarget:glView selector:@selector(display)];
    [displayLink addToRunLoop:[NSRunLoop currentRunLoop] forMode:NSDefaultRunLoopMode];

     dispatch_get_main_queue(), ^{
        dispatch_async(dispatch_queue_create("yeah", DISPATCH_QUEUE_SERIAL), ^{
            CFTimeInterval lastTime = 0;
            CFTimeInterval time = 0;
            int instructions = 0;
            while(1) {
                // here be cpu emulation
                if (lastTime == 0) {
                    lastTime = CACurrentMediaTime();
                } else {
                    CFTimeInterval newTime = CACurrentMediaTime();
                    time += newTime - lastTime;
                    lastTime = newTime;
                }
                if (++instructions == 1000) {
                    printf("%f\n", 1/(time * 1000));
                    time = 0;
                    instructions = 0;
                }
            }
        });
}
- (void)glkView:(GLKView *)view drawInRect:(CGRect)rect
{
    glClearColor(0.0, 0.0, 0.0, 1.0);
    glClear(GL_COLOR_BUFFER_BIT);
    // Here be graphics
}

@end

Like this, the infinite loops is basically just counting it's iterations and printing out it's frequency in MHz.

So, the problem is, when the app starts, the loop runs at about 9-15 MHz (on an iPhone6), and if I look at the GPU Report in Xcode's Debug Navigator, I can see that the CPU frame time is 0.2ms Then, after running for a couple of seconds, the loop drops to 1-5 MHz, and the CPU frame time increases to 0.6ms

If I disable the GLKView updates, then the loops never become slower

I also tried using different threading API's (gdc, NSThread, pthread), but that doesn't seem to have any impact

My question is, am I doing something wrong somewhere? Is it just a case of the GLKView not being fully initialised for a couple of seconds and so using less cpu than normal and I get a speed boost? Any other ways I could structure the code to get the most performances in the loop?

Update I did some more testing, and noticed that the problem is also present when using a CAEAGLLayer instead of a GLKView, also that it doesn't happen on the Simulator, only on a device. I also tried with a OS X application with NSOpenGLView and it doesn't happen either...

Update 2 I tried starting the thread after a while instead of immediately, and if the delay is bigger than the time it usually takes for the drop to occur, the thread starts already slowed down... Not really sure what to make of it...

Metal Update I tried using Metal instead of OpenGL, using simple the stock template from Xcode, and it's happening with it too...

Fabio Ritrovato
  • 2,546
  • 1
  • 13
  • 19
  • you are not supposed to create an infinite loop on any thread. You should schedule a task that runs once every frame, and performs its task once only every frame. There are many ways to do so, but first check if there's something built into glkit before using something like display link updates/delegate or nstimer. Possible pointers: http://stackoverflow.com/questions/13653113/multithreading-glkview-drawing – CodeSmile Jan 02 '15 at 09:53
  • 1
    Sorry, but I do need an infinite loop, that's how emulators work. Performing a task every frame would give me at most 60 "operations" every seconds, I need significantly more than that (about 1000000 times that or so...) For screen updates that's fine, but not for cpu emulation – Fabio Ritrovato Jan 02 '15 at 11:37
  • 1
    i think you need to read a lot more about how emulation, multithreading and synchronization work. Say your emulated machine's cpu runs at 1 mhz, and you have a queue of commands, each of which takes a known number of cpu cycles to execute. You would then run those, say 1500, commands the cpu could execute within a given time frame and emulate them on the host machine. Then you start a new cycle. This is a very simplified model. You will not need an infinite loop and you will have to have synchronization points (typically at the vsync rate of the emulated machine). – CodeSmile Jan 03 '15 at 13:31
  • That's one way to do it, but it's not the only one. Having an infinite loop is definitely a viable alternative (for example [link](https://code.google.com/p/virtualc64/source/browse/trunk/C64/C64.cpp), [link](http://www.cebix.net/viewcvs/cebix/Frodo4/Src/C64_WIN32.h?view=markup)). Ultimately it's not really the point of the question but more of an architectural choice, and while I could probably easily change my code to run the other way, I'm not ready to drop it on a dogmatic "don't do it just because", unless there some actual reason why it's causing the issue (which I don't think it is) – Fabio Ritrovato Jan 03 '15 at 19:54
  • well the code you linked to is for desktop (win32). Consider that you are developing for mobile, where the os takes measures to prevent apps from consuming too much cpu power (draining the battery), so your infinite loop could very well be the problem because from the perspective of the os it looks like a thread that locked up and may need to be throttled or terminated. Since you observed this happening only on ios it may confirm my point (more a hunch). At least try to yield the thread every once in a while (ie every 1000 iterations) to test whether it changes the observed runtime behavior – CodeSmile Jan 04 '15 at 11:11
  • it probably also matters whether your test device has a single or dual core cpu – CodeSmile Jan 04 '15 at 11:12
  • I'm using an iPhone6 so it's dual core. I tried doing some yelding (usleep, sched_yield and NSThread sleepForTimeInterval), and the problem persisted, the only difference was that the time it took for speed to drop decrease while I increased the yeld time. It feels weird to me that the os would throttle active working threads instead of those that aren't doing much – Fabio Ritrovato Jan 04 '15 at 18:28
  • @LearnCocos2D I tried your suggestion of doing batches of emulation cycles on frame updates, like **- (void)glkView:(GLKView *)view drawInRect:(CGRect)rect { int temp; for (int i = 0; i < 17095 * 4; ++i) { temp += arc4random(); } glClearColor(0.0, 0.0, 0.0, 1.0); glClear(GL_COLOR_BUFFER_BIT); }**, but the issue is presenting again, with CPU time per frame jumping from 9ms to 14ms after a couple of seconds... Weirdly if I lower the number of operations per cycle, it doesn't seem to happen anymore... – Fabio Ritrovato Jan 07 '15 at 23:43
  • What do you mean? It doesn't happen with fewer instructions but it does with an almost empty loop like in the example? – Jerem Jan 08 '15 at 08:18
  • @JeremyLaumon I mean it happens in both cases (real cpu emulation or empty loop), but only after a certain amount of iterations per update cycle... – Fabio Ritrovato Jan 08 '15 at 10:31
  • I know this is an old post,but have you figured out a way to fix the problem? does it seem to be coming from the GLKView? – Philippe Paré Jul 13 '16 at 12:26
  • I had a chat with some Apple's engineers at last years WWDC, and if I remember correctly the gist of it was that the CPU was throttling itself down because it "detected" that it wasn't doing any particularly useful operation. So it didn't seem related to OpenGL itself, and my solution below involving CALayer probably convinced the CPU to not throttle itself... – Fabio Ritrovato Jul 13 '16 at 12:45

2 Answers2

2

The CPU frequency can be lowered by the OS to consume less energy / save battery. If your thread is not using much CPU power, then the OS will think it's a good time to lower the frequency. On a desktop computer on the other hand, there are many other threads / processes running (and the thresholds are probably very different), that's probably why it seems to work in the simulator / in a desktop app.

There are several possible reasons as why your thread is detected as not consuming much CPU time. One is that you call printf, and there is probably some sort of lock inside that makes your thread wait (CACurrentMediaTime maybe too). An another one is probably linked to the GLKView updates, although I'm not sure how.

Jerem
  • 1,725
  • 14
  • 24
  • It feels weird to me that it would throttle a thread that is definitely not idle or wasting cycles, but doing actual work. I thought about your first hypothesis as well, but my actually app has neither printf or CACurrentMediaTime (i added them for debugging), and as far as I can tell no system call since it's all custom code. I still think it may have to do with OpenGL, since if I disable the updates it runs fine... – Fabio Ritrovato Jan 04 '15 at 18:33
  • Hum yeah it would be strange indeed. That also happens when the OpenGL code is just a clear, like in your example ? – Jerem Jan 04 '15 at 18:46
  • If the OpenGL driver is preempting your thread, you may be able to see with Xcode's profiler. It seems that it also contains an energy diagnotics feature, could be useful too. https://developer.apple.com/library/mac/documentation/DeveloperTools/Conceptual/InstrumentsUserGuide/AnalysingCPUUsageinYourOSXApp/AnalysingCPUUsageinYourOSXApp.html – Jerem Jan 04 '15 at 19:22
  • I tried running some Instruments, but doesn't look like I'm getting much from it... From the energy one http://i.imgur.com/eyaNBDK.png it does look like graphics usage doubles after a bit of time (i think the first peak is due to the app starting...) From the OpenGL one http://i.imgur.com/VZZPZEa.png I don't know, looks like the app is not doing anything as expected, just a bunch of redundant clear calls... – Fabio Ritrovato Jan 06 '15 at 21:04
  • What about the multicore trace thing? – Jerem Jan 07 '15 at 19:23
  • Not available on device – Fabio Ritrovato Jan 07 '15 at 19:49
0

So, I still haven't figured out why it's happening, but I managed to find a workaround, using a CALayer backed by a CGBitmapContext instead of using OpenGL, inspired by https://github.com/lmmenge/MeSNEmu,

@interface GraphicLayer : CALayer
{
    CGContextRef _context;
}
@end

@implementation GraphicLayer

-(id)init
{
    self = [super init];
    if (self) {
        CGColorSpaceRef colorSpace = CGColorSpaceCreateDeviceRGB();
        _context = CGBitmapContextCreate(NULL, 418, 263, 8, 418 * 4, colorSpace, (CGBitmapInfo)kCGImageAlphaPremultipliedLast);
        CFRelease(colorSpace);
    }
    return self;
}

- (void)display
{
    CGImageRef CGImage = CGBitmapContextCreateImage(_context);
    self.contents = (__bridge id)(CGImage);
    CGImageRelease(CGImage);
}

@end

@interface GraphicView : UIView
@end

@implementation GraphicView

+ (Class)layerClass
{
    return [GraphicLayer class];
}

- (void)drawRect:(CGRect)rect
{
}

@end

Using this the loops don't slow down (either having an infinite loop or doing a bunch of operations on every frame), but I'm not entirely sure why...

Fabio Ritrovato
  • 2,546
  • 1
  • 13
  • 19
  • Did you test on another type of iphone? Maybe the iphone 6 has a special feature that reduce the clock speed of the CPU when the GPU is running stuff, and the bitmap context doesn't trigger this. That's far fetched but I've seen devices where enabling Wifi automatically reduced the GPU frequency for example. – Jerem Jan 11 '15 at 13:24