9

I was discussing with some friends a piece of code, and we discussed about using memset function in C, which is the order in Big-O notation for this function if we initialize an array of size N?

franvergara66
  • 10,524
  • 20
  • 59
  • 101
  • 16
    Is there any reason to believe it'd be anything other than `O(n)`? – Mysticial Jul 26 '12 at 06:04
  • The C standard doesn't give any complexity guarantees (like the C++ standard does), but as @Mysticial implies, it's hard to imagine how it would be anything other than O(N). It obviously can't be any lower than that, and you'd have to do extra work to have higher complexity. – Jerry Coffin Jul 26 '12 at 06:06
  • @Mysticial I made a piece of code that measures the time of the routine to initialize an array with values, and memset time is much lower than a for loop. What is the reason?, Hence my question comes from the complexity – franvergara66 Jul 26 '12 at 06:07
  • @JerryCoffin Technically, it "could" be lower, if the compiler optimizes the whole thing out. Likewise, larger datasets touch higher-levels of cache which are slower. But now we're getting pedantic. – Mysticial Jul 26 '12 at 06:07
  • 2
    @Melkhiah66 That's just a difference in the big-O constant. The compiler's `memset()` is probably much more optimized than your loop. – Mysticial Jul 26 '12 at 06:08
  • @Melkhiah66: the difference in speed is more likely a result of careful optimization than anything affecting the actual complexity. Complexity governs behavior as the size of data approaches infinity, ignoring factors that can be quite significant in real life. – Jerry Coffin Jul 26 '12 at 06:10
  • I should probably also add: when you're dealing with a large array (i.e., something much larger than cache) your loop will probably come a lot closer to keeping up. The bandwidth to memory quickly becomes a limiting factor, so even poor code can (nearly) saturate the bus. – Jerry Coffin Jul 26 '12 at 06:12

4 Answers4

19

On a system where you have direct access to the page tables and they're stored in a hierarchical way, memset could be implemented in O(log n) by replacing the whole virtual address mapping with copy-on-write references to a single page filled with the given byte value. Note however that if you're going to do any future modifications to the object, the normal O(n) cost of memset would just be deferred to the page fault to instantiate separate copies of the pages when they're modified.

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
  • 4
    I'll upvote this if you can show me an existing C implementation that does that. :) – Thomas Padron-McCarthy Jul 26 '12 at 07:07
  • 1
    This is a cute idea, but it is still O(n) to undo the previous address mapping, unless it is also in this replicated form (as some untouched initially-zero memory might be), and it only counts the time from the start of the memset call to its return. The fact that the actual memory modification occurs in the future does not mean it is not part of the computation cost. Also, hierarchical page tables do not have O(log n) time to change. The hierarchy means there are O(log n) levels, but the number of page table entries is still roughly proportional to the number of pages. – Eric Postpischil Jul 26 '12 at 12:41
  • 1
    @Eric: About being proportional to the number of pages, that's not the case if you cheat and make each tree level use the same physical storage for both its left and right child. :-) – R.. GitHub STOP HELPING ICE Jul 27 '12 at 00:11
  • This is a theoretical possibility. Don't think any user mode runtime does this. – ThirdOne Jul 27 '12 at 15:14
  • 3
    Of course not. The whole implementation I described was whimsical. I wrote it more as a counter-point to the claim that `memset` "must" be `O(n)` or worse. And I'm both surprised and a bit disturbed that it's the accepted answer... :-) – R.. GitHub STOP HELPING ICE Jul 27 '12 at 22:35
17

You asked about complexity, but you probably intended to ask about performance.

Complexity, referred to with the notation O(n), is a concept relating to how the number of operations in an algorithm is compelled to grow as the problem size grows. O(n) means that at most some number of steps proportional to the problem size must be performed. It does not say what that proportion is. memset is O(n). O(n2) means at most some number of steps proportional to n2 must be performed. memset is better than just O(n2) because setting 2n bytes takes only twice as much work as n bytes, not four times as much work, in general.

You are likely more interested in the performance of memset, because the library version of memset performs much more quickly than a simple C version you might write.

The library version performs much more quickly because it uses specialized instructions. Most common modern processors have instructions that allow them to write 16 bytes to memory in one instruction. The library implementors write critical functions like memset in assembly language or something close to it, so they have access to all these instructions.

When you write in C, it is difficult for the compiler to take advantage of these instructions. For example, the pointer to the memory you are setting might not be aligned to a multiple of 16 bytes. The memset authors will write code that tests the pointer and branches to different code for each case, with the goal of setting some bytes individually and then having a pointer that is aligned, so they can use the fast instructions that store 16-bytes at a time. This is only one of a number of complications that the library implementors deal with when writing routines like memset.

Because of those complications, the compiler cannot easily take your C implementation of memset and turn it into the fast code that experts write. When the compiler sees, in C code, a loop that writes one byte at a time, it typically generates assembly language that writes one byte at a time. Optimizers are getting smarter, but the complications limit how much they are allowed to do and how much they can do without generating a lot of code to handle cases that might rarely occur.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
1

The complexity is O(n). This is basic stuff.

ThirdOne
  • 1,227
  • 9
  • 13
  • I made a piece of code that measures the routine to initialize an array with values, and memset time is much lower than a for loop. What is the reason?, Hence my question arises from the complexity – franvergara66 Jul 26 '12 at 06:08
  • Well your loop-code just has a bigger O. But both your code and memset should have the same N. – wildplasser Jul 26 '12 at 06:11
  • 1
    O(n) just means that the complexity is linear to input. It has nothing to do with time taken to execute. Both your loop and the memset are O(n). – ThirdOne Jul 26 '12 at 06:14
  • 6
    If one method takes N microseconds for N elements, and another takes N years, they're still both O(N). – Keith Thompson Jul 26 '12 at 06:14
1

Some C libraries provide vectorised versions of memset(). Unless your compiler does automatic vectorisation and loop unrolling, your for loop will be way slower than a vectorised memset(). Vectorised or not, memset() is limited by the memory bandwidth and the minimum time is proportional to the array size divided by the memory bandwidth, i.e. it is an O(n) operation as memory bandwidth is constant.

On NUMA machines memsetting very large arrays could be threaded to achieve speedup of the order of the number of NUMA nodes. See this answer for some benchmarks.

Community
  • 1
  • 1
Hristo Iliev
  • 72,659
  • 12
  • 135
  • 186