0

My question is essentially this: when and why should I not pass a pointer. For example, I know that if I want to modify some data elsewhere using a function, I'm going to need to pass a pointer so I can actually change that chunk of memory. So is the standard to just pass pointers when I need to modify out of scope variables and don't otherwise? What about structs, which can potentially be quite large chunks of memory? Would it be preferable to pass an address-sized pointer (most often 8 bytes) as a parameter or my 50 byte struct? Does it even matter?

TL;DR is what do I need to keep in mind (besides the obvious, i.e. what I need to do with the function) when defining the parameters of my functions?

Sulthan
  • 128,090
  • 22
  • 218
  • 270
  • Possible duplicate: https://stackoverflow.com/q/373419/10871073 ? – Adrian Mole Jun 28 '20 at 12:12
  • @Adrian Mole No, read the TL;DR. I know the difference between pass by reference and pass by value (also the top answer on that post seems to have forgotten about C :P) –  Jun 28 '20 at 12:16
  • Example: programming on embedded device, passing a struct with a big array throughout lots of functions. You only want to allocate it on the stack once, so use a pointer or every function call will build a big array on the stack. – Bayleef Jun 28 '20 at 12:28
  • The size of the array shouldn't matter, right? It's just a pointer. Also this stuff is going on the heap, I'm going to have thousands and thousands of these structs at the same time. –  Jun 28 '20 at 13:13

2 Answers2

2

As you say, a pointer will be only 4 or 8 bytes, so passing a pointer would need less memory to be copied. Of course a pointer access would cost some more time, but it is hard to say what would be faster for structs of relatively small size (and 50 bytes is kinda small still for modern machines) when structs get a lot bigger a pointer would definitely be better.

All in all, for most cases you will not notice the performance difference unless you have structs of a larger size (lets say starting from 100+ bytes).

Of course when you need to modify a struct you need the pointer either way.

Edit: Also before I get comments, the 100+ bytes is a total guess, it might as well be a lot more or already worth from 50.

Koen
  • 174
  • 1
  • 10
  • I didn't think about pointer access time. Also good point about structs, because none of mine are *particularly* massive, although some may contain arrays of structs which also contain arrays of structs and so on... I could potentially have structs of kilobytes (maybe hundreds of kilobytes for the craziest). Would that start to seriously impact performace? Also would passing by reference even make a difference at that point? (Adrian made the point that compilers may just make a local copy to simplify frequent dereferencing) –  Jun 28 '20 at 12:27
  • 1
    If you have a struct containing an array of structs it does not matter because an array is a pointer, so only the pointer would be copied as well, not increasing the struct size – Koen Jun 28 '20 at 13:06
  • Yeah... I don't know what I was thinking. It's late –  Jun 28 '20 at 13:11
  • 1
    Of course a compiler will do very well in optimizations (like the local copy) which is why the difference will usually not be noticeable anyway, and yes, a struct of kilobytes might hurt performance, but I doubt that that happens often, but then a reference would definitely be better (although I wouldn't be surprised if the compiler optimizes that for us as well) – Koen Jun 28 '20 at 13:12
2

In the early days, K&R C didn't even let you pass structs by value. The only ways to create a copy was via memcpy() (or your own implementation of it). ISO C then defined copying and assignment, but conventional wisdom was that you really wanted to avoid copying data around: Memory access is expensive.

That principle is even "truer" today, but the conclusions we draw from it have been turned on their head: We sometimes copy more explicitly to avoid unnecessary implicit "read-throughs" all the way to the RAM. The reason is that modern cores have caches which can be accessed with much lower latency than the RAM proper, and computation has become very cheap. We want to make sure the cache contents is independent and does not need expensive refreshes/write-throughs.

With parallel processing and multi core CPUs, it turns out that aliasing (accessing the same memory through different identifiers, which is what you do with a pointer) prevents independent operations on local cached copies of that memory because a different thread or core may have written to it. Synchronizing the cache carries a comparatively large cost.

If each core or thread can operate on their own local data copy they don't need to wait and can focus on the task at hand, so to speak. The benefit of this independence tends to outweigh the cost of the initial, explicit copying to an astonishing degree. Functional languages which essentially move copies of data around have received more attention recently exactly because their paradigm essentially makes a program appear as a collection of data-independent tasks, which makes parallelization easier to a degree which even allows automatic parallelization.

Bottom line: Even in a single-threaded program written in an imperative language like C, working on copies of data may allow the compiler to generate more efficient code which may outweigh the penalty for the explicit copying in the first place.

Peter - Reinstate Monica
  • 15,048
  • 4
  • 37
  • 62