The n
attribute is the zero-based index of the fragment, incremented by 1 for each new fragment. Just a meaningless counter: 0, 1, 2, 3, 4, ...
The r
attribute indicates that r
more fragments with the same duration follow the current fragment. It allows you to replace this:
<c t="1000" d="1000" />
<c t="2000" d="1000" />
<c t="3000" d="1000" />
<c t="4000" d="1000" />
With this much more compact representation:
<c t="1000" d="1000" r="3" />
You can think of it as just duplicating the XML element r
number of times.
Edit: Based on the comment, I now understand the source of the confusion - the question is not actually about what these attributes are but why, with a live stream, does only n
change as time goes along.
To understand this, you must understand how a live video is represented conceptually and how this differs from an on-demand video. The latter has a definite beginning and end, with a fixed number of fragments in between:
(start)123456789(end)
Whereas a live video by definition is one with no end - there may be a "last fragment" but new fragments are continually added to the end and what is currently the "last fragment" will change as time goes along:
(start)1234
(start)12345
(start)123456
Now this works all fine and super but you probably notice a problem here. Adaptive streaming technologies allow you to play any fragment of a video. If your video goes on, essentially, forever then the origin server must store an effectively infinite number of fragments! This cannot be allowed.
To solve this problem, adaptive streaming technologies introduce the concept of a DVR window - a sliding window over the video that contains all the data that can be viewed by players. Any data that slides out of range of this window can be discarded.
(start)[1]
(start)[12]
(start)[123]
(start)1[234]
(start)12[345]
(start)123[456]
(start)1234[567]
(start)12345[678]
(start)123456[789]
Let's discard the fragments we do not need and see how that looks. If your sliding window has a size 3 then the fragments visible to players would progress in time like this:
1
12
123
234
345
456
You notice that the size of the sliding window remains constant (once enough fragments are available to fill it) and that the index of the first fragment plus the sliding window size is sufficient to represent the entire sliding window.
There you have it: r
is the number of fragments in the sliding window and n
is the index of the first fragment! This is not the only way to represent live video but it is certainly the most efficient, due to the obvious small size of the data in the manifest.