9

i am a little confused about the NV12 format. i am looking the this page to understand the format. What i currently understand is that if you have an image or video of 640 x 480 diminsion then the Y plane will be having 640 x 480 bytes and U and V both planes have 640/2 x 480/2. It does not mean that U plane have 640/2 x 480/2 and V plane have 640/2 x 480/2 both have only 640/2 x 480/2 bytes. so the total number of bytes in out buffer array will be. 2 is multiplied with (640/2) * (480/2) because uv plane will take two bytes.

byte [] myArray new byte[(640 * 480) + (2 * (640/2) * (480/2)) ];

so the question is that i am understanding it in a right way? and am i making the byte array in the format that specify the NV12 format.

Rotem
  • 30,366
  • 4
  • 32
  • 65
Madu
  • 4,849
  • 9
  • 44
  • 78

3 Answers3

31

The NV12 format is subsampled as 4:2:0

420

The total size of a frame is W x H x 3 / 2 Where W is width and H is height.

1 frame in vga resolution is 460800 bytes, where

  • Y-part is 640x480 bytes
  • Cb-part is 640*480/4=76800 bytes
  • Cr-part is 640*480/4=76800 bytes

Hope this answers your question...

Fredrik Pihl
  • 44,604
  • 7
  • 83
  • 130
2

To add to the first answer, the NV12 format interleaves the U and V chroma data.

For a 640x480 image frame, the NV12 representation consists of 720 rows of 640 bytes:

  • the first 480 rows each contain 640 luminance (Y) values.

  • the last 240 rows each contain 320 tuples of (U,V) values.

Hugues
  • 2,865
  • 1
  • 27
  • 39
0

Yes, you are understanding it in a correct way.

YUV NV12 format has one Y plane of size of image height * width and another half plane consisting of sub-sampled interleaved U V values (height / 2 * width / 2). The total size, as you correctly calculated, is height * width + 2 * (height / 2 * width / 2) = 3 / 2 * height * width

NV12 has 12 bits per pixel.

==============

As mentioned on the Microsoft website, jump to NV12 section:

YUV Sampling

Chroma channels can have a lower sampling rate than the luma channel, without any dramatic loss of perceptual quality. A notation called the "A:B:C" notation is used to describe how often U and V are sampled relative to Y:

  • 4:4:4 means no downsampling of the chroma channels.
  • 4:2:2 means 2:1 horizontal downsampling, with no vertical downsampling. Every scan line contains four Y samples for every two U or V samples.
  • 4:2:0 means 2:1 horizontal downsampling, with 2:1 vertical downsampling.
  • 4:1:1 means 4:1 horizontal downsampling, with no vertical downsampling. Every scan line contains four Y samples for each U and V sample. 4:1:1 sampling is less common than other formats, and is not discussed in detail in this article.

The following diagrams shows how chroma is sampled for each of the downsampling rates. Luma samples are represented by a cross, and chroma samples are represented by a circle.

Figure 1: Sub-sampling of Pixel Data

NV12

All of the Y samples appear first in memory as an array of unsigned char values with an even number of lines. The Y plane is followed immediately by an array of unsigned char values that contains packed U (Cb) and V (Cr) samples. When the combined U-V array is addressed as an array of little-endian WORD values, the LSBs contain the U values, and the MSBs contain the V values. NV12 is the preferred 4:2:0 pixel format for DirectX VA. It is expected to be an intermediate-term requirement for DirectX VA accelerators supporting 4:2:0 video. The following illustration shows the Y plane and the array that contains packed U and V samples.

Figure 2: enter image description here