Error when using Assembly code to render a video in OpenGL

Question

I followed this tutorial from Nehe in order to render a video in OpenGL. I'm trying to build it on x64, but it gives me an error when compiling the code and it points to flipIt(void* buffer) function. Is it not written well or I need to import a library?

void Video::flipIt(void* buffer)
{
    void* b = buffer;
    __asm
    {
        mov ecx, 256 * 256
        mov ebx, b
        label :
        mov al, [ebx + 0]
            mov ah, [ebx + 2]
            mov[ebx + 2], al
            mov[ebx + 0], ah

            add ebx, 3
            dec ecx
            jnz label
    }
}

Error C4235 nonstandard extension used: '__asm' keyword not supported on this architecture. — Oliver Hoover, Nov 05 '19 at 20:09
[Inline assembler is not supported for x64](https://stackoverflow.com/questions/6166437/64bit-applications-and-inline-assembly) — BDL, Nov 06 '19 at 08:56
It looks like Ross edited your question because you were trying to morph it into something else. Don't ask followup questions in your original Question. Ask your new question in a new Question... maybe with a link back to this one for context. — Mark Storer, Nov 06 '19 at 17:56

score 5 · Accepted Answer · answered Nov 05 '19 at 20:10

5

Microsoft Developer Network (MSDN) has 24 bit bitmaps which are RGB, so on WINDOWS, RGBs data are actually stored backwards (BGRs), but in OpenGL, RGB is just RGB.

The solution Nehe uses is to write Assembly code which is a bad idea, in my opinion, because Visual C++ does not support inline assembly for x64, so you can't swap the bytes using ASM code. What can you do right now is to modify the texture generation code to use GL_BGR instead of GL_RGB, but be carefull, some OpenGL drivers have problems with GL_BGR.

So remove the _asm function and change the GL_RGB to GL_BGR in glTexSubImage2D(...) function:

glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, 256, 256, GL_BGR, GL_UNSIGNED_BYTE, data);

answered Nov 05 '19 at 20:10

Cata M.

80
9

1

Also importantly, *that* inline asm looks slow compared to what you can do with SIMD intrinsics, especially for SSSE3 `_mm_shuffle_epi8` to do a byte shuffle on 16 bytes at once. Or even with pure C you could get a compiler to emit scalar code better than that. – Peter Cordes Nov 06 '19 at 01:28
In fairness to nehe, that code was written over 20 years ago. Back at that time (under Visual C 6), the asm routine was a big performance improvement over the code VC6 would emit. GL_BGR didn't come along until OpenGL 1.2, hence the need to flip the bytes manually. The joys of ancient tutorials ;) – robthebloke Nov 06 '19 at 04:03
1

@robthebloke: that seems surprising. I guess you're talking about tuning for P5 Pentium? IDK why a compiler would have trouble emitting a sane loop if you used `char*`. Apparently P5 had slow `movzx` so maybe you mean VC6 would always insist on loading single bytes with `movzx`? This asm loop doesn't seem that well optimized for P5, though; `add` and `dec` could each pair with a `mov` in the other pipe. (https://www.agner.org/optimize/). At least this asm doesn't cause partial-register stalls on PPro, but using AH and AL will cause a false dependency on K7/K8 (and later AMD). – Peter Cordes Nov 06 '19 at 06:45
Unrolling by 2 could let you reduce loop overhead, and cover 6 bytes in 2 loads (of dword + word). A shift (1 cycle, pairs in the U pipe) doesn't save total instructions but fewer of them have to be memory so they can pair better and come closer to achieving 1 byte-store per clock like a modern OoO exec CPU could. (Shifting bytes between dword registers with `shld` wouldn't be profitable on P5) – Peter Cordes Nov 06 '19 at 06:51
Thank you guys. I removed the ASM code, but it still not working. I changed the GL_RGB statement to GL_RGB when I try to update the texture. So far so good, but why when I'm running the code with the video Nehe uses as example is showing a black window? And when I try to use a different video downloaded online it failed to open the AVI frame. I tried debugging the code and I found that AVIStreamGetFrameOpen is returning NULL, why it cannot read my video. – Oliver Hoover Nov 06 '19 at 16:45
First you have to encode the video in cinepak format. VFW supports these formats only: **msrle**, **msvideo1**, **cinepak** or **indeo 3.2**. So, [ffmpeg for Windows](https://ffmpeg.zeranoe.com/builds/) can be used to convert the video into cinepak or msvideo1 format. Use the following command line: `ffmpeg -i your_video.xxx -vcodec cinepak output.avi` – Cata M. Nov 06 '19 at 17:08
@OliverHoover: note that those codecs are all old and inefficient compared to h.264. In a new application you wouldn't actually want to use any of them; bad quality per bitrate. Use h.264 (with the x264 encoder: `ffmpeg -i foo.mp4 -c:v libx264 -preset slower -crf 23 -movflags +faststart output.mp4`), or royalty-free VP8 or VP9. Modern video cards typically have hardware decode for VP9, but somewhat older HW might only have VP8 or h.264. The fact that the old VFW API doesn't support these codecs means you should use something newer. – Peter Cordes Nov 07 '19 at 03:18

Error when using Assembly code to render a video in OpenGL

1 Answers1