3

I need to set some default value for all entires in a very large array. It takes me quite long time (110-120 ms) and i suspect it happens because of misses in memory.

I use memset/std:fill to set the default value. Is there a way to make sure that the array will reside in memory before the memset/fill?

bdonlan
  • 224,562
  • 31
  • 268
  • 324
Erik Sapir
  • 23,209
  • 28
  • 81
  • 141
  • 3
    What do you mean by 'in memory'? Resident? Mapped into the page tables? In cache? In L1? – bdonlan Jul 07 '11 at 20:37
  • Also, what OS are you targeting? – bdonlan Jul 07 '11 at 20:38
  • By "misses in memory", do you mean page faults? – hammar Jul 07 '11 at 20:38
  • 1
    Yes, page faults. I am developing on Mac OS – Erik Sapir Jul 07 '11 at 20:39
  • @bdonlan: a very large array in L1? :) – Karoly Horvath Jul 07 '11 at 20:39
  • You're venturing into platform-specific territory. Please say which OS you're targeting. (If you're targeting multiple platforms, ask a separate question for each one. The answers will be different.) – Rob Kennedy Jul 07 '11 at 20:39
  • This is too vague to answer. Perhaps a code snippet to illustrate and some clarification. – AJG85 Jul 07 '11 at 20:40
  • 3
    Chances are that the tricks to force stuff to reside in memory (such as mlock() ) have to fault in the memory pages just as "slowly" as your memset(). – nos Jul 07 '11 at 20:48
  • 2
    `mlock` will do this, handle with care - http://stackoverflow.com/questions/3211063/force-allocating-real-memory – Steve Townsend Jul 07 '11 at 20:48
  • I'll bet the comment made by nos is right, but it still might be worth a test. – Michael Burr Jul 07 '11 at 20:56
  • An `unsigned int*` is _not_ "an array variable". However, it may point to one or more `unsigned int`s somewhere else in memory. – Lightness Races in Orbit Jul 07 '11 at 20:59
  • I've just read an article from Intel that mentions they have optimized libraries for such functions as memset. Mac OS X uses libc.so which is not optimized. The optimised version of memset is in libirc.a and is called _intel_fast_memset http://software.intel.com/en-us/articles/optimizing-without-breaking-a-sweat/ – yan bellavance Jul 07 '11 at 22:38

3 Answers3

1

Assuming this is a large memory-mapped file, you can use the madvise() libc call with the MADV_WILLNEED argument to hint to the OS that you'll be wanting to access the region mentioned soon.

However YMMV, as the array needs to be large enough that the benefit of the resulting syscall isn't outweighed by the cost of making the call.

DaveR
  • 9,540
  • 3
  • 39
  • 58
0

You can lock memory at per-page granuality using mlock, though only up to a fixed amount (I'm not sure what the limit is on OS X, but you can check it using getrlimit with RLIMIT_MEMLOCK).

Jack Lloyd
  • 8,215
  • 2
  • 37
  • 47
0

Most likely you have a multiple core processor and functions like memset actually degrade in performance when not used on single core CPUs. It's possible that mutex locking are causing the slowdown. Try allocating memory on the stack instead of dynamic memory. Since it's a very large array then I would experiment making my own memory manager and store segments of it in multiple threads (but that's just an idea I had after reading an article fast). A standard way of doing it would be to use one memory allocator per thread. In any case I would look into something else than memset.

Maybe the following aticle would help

yan bellavance
  • 4,710
  • 20
  • 62
  • 93
  • On any modern OS (and certainly on OS X as per OP) there is nothing special about writing to the stack v.s. the heap - it's all just pages in (virtual) memory - so there won't be any difference in speed calling memset() on stack memory v.s. heap memory. – DaveR Jul 07 '11 at 21:14
  • Also note that there's zero locking in OS X' generic [memset](http://www.opensource.apple.com/source/Libc/Libc-594.9.5/string/memset-fbsd.c) routine or in the [x86-64](http://www.opensource.apple.com/source/Libc/Libc-594.9.5/x86_64/string/memset.s) one. – DaveR Jul 07 '11 at 21:17
  • @Dave Even though threads are autonomous, there is stil only one memory allocator which creates conflicts between them. – yan bellavance Jul 07 '11 at 21:20
  • Locking in a memory allocator only comes into play when the memory is allocated (or deallocated). The use-case the OP describes is just *single* allocation which is being written to to initialise it; so locking is not relevent. – DaveR Jul 07 '11 at 21:22
  • @Dave Mac OS X version 10.6.6 still uses an old version of memset dating from 1993. – yan bellavance Jul 07 '11 at 21:30
  • Do you have a source for that date? As the copyright notice for memset.s in the LibC from OS X 10.6.6 [i386](http://www.opensource.apple.com/source/Libc/Libc-594.9.4/i386/string/memset.s), [x86-64](http://www.opensource.apple.com/source/Libc/Libc-594.9.4/x86_64/string/memset.s) clearly states 2005; and as you can see (if you read the source) there are no syscalls in there. Even if the code was from 1993 what does that signify regarding efficiency? – DaveR Jul 07 '11 at 21:35
  • I got it from the man pages but I could be wrong: http://developer.apple.com/library/mac/#documentation/Darwin/Reference/ManPages/man3/memset.3.html . The point is I am pretty sure the OP's bottleneck is in the function call and should consider another approach. – yan bellavance Jul 07 '11 at 22:08
  • @yanbellavance let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/1244/discussion-between-dave-rigby-and-yan-bellavance) – DaveR Jul 07 '11 at 22:10