221

I'm reading the documentation and I am constantly shaking my head at some of the design decisions of the language. But the thing that really got me puzzled is how arrays are handled.

I rushed to the playground and tried these out. You can try them too. So the first example:

var a = [1, 2, 3]
var b = a
a[1] = 42
a
b

Here a and b are both [1, 42, 3], which I can accept. Arrays are referenced - OK!

Now see this example:

var c = [1, 2, 3]
var d = c
c.append(42)
c
d

c is [1, 2, 3, 42] BUT d is [1, 2, 3]. That is, d saw the change in the last example but doesn't see it in this one. The documentation says that's because the length changed.

Now, how about this one:

var e = [1, 2, 3]
var f = e
e[0..2] = [4, 5]
e
f

e is [4, 5, 3], which is cool. It's nice to have a multi-index replacement, but f STILL doesn't see the change even though the length has not changed.

So to sum it up, common references to an array see changes if you change 1 element, but if you change multiple elements or append items, a copy is made.

This seems like a very poor design to me. Am I right in thinking this? Is there a reason I don't see why arrays should act like this?

EDIT: Arrays have changed and now have value semantics. Much more sane!

Cthutu
  • 8,713
  • 7
  • 33
  • 49
  • 2
    Swift array semantics are quite confusing, there are saying its value typed. Here is a discussion on that https://devforums.apple.com/message/975661#975661 – Anil Varghese Jun 06 '14 at 12:07
  • 95
    For the record, I don't think this question should be closed. Swift is a new language, so there are going to be questions like this for a while as we all learn. I find this question very interesting and I hope that someone will have a compelling case on the defense. – Joel Berger Jun 06 '14 at 12:11
  • Just to clarify. The core design decision being made here to have `.append` **"modify"** the array rather than (more sensibly imho) **return a new sequence**. Since arrays are immutable (?) in swift this causes nasty implicit copying. And that's what the OP has an issue with, not arrays per se, but that there's something wrong with the `append` implementation having its mutable/immutable cake and eating it? – Nathan Cooper Jun 06 '14 at 16:26
  • 4
    @Joel Fine, ask it on programmers, Stack Overflow is for specific unopinionated programming problems. – bjb568 Jun 06 '14 at 18:06
  • 21
    @bjb568: It's not opinion, though. This question should be answerable with facts. If some Swift developer comes and answers "We did it like that for X, Y, and Z," then that's straight up fact. You may not agree with X, Y, and Z, but if a decision was made for X, Y, and Z then that's just a historical fact of the language's design. Much like when I asked why `std::shared_ptr` doesn't have a non-atomic version, [there was an answer based on fact, not opinion](http://stackoverflow.com/a/15140227/1287251) (the fact is that the committee considered it but didn't want it for various reasons). – Cornstalks Jun 06 '14 at 20:58
  • @Cornstalks Yeah, fine, too broad then. – bjb568 Jun 06 '14 at 21:01
  • @Cornstalks That isn't fact, that is still opinion. The opinion of the creators of the language. The question is if this design is bad or not, which is opinion. That isn't to say that opinions on this subject aren't valuable, but that they are not a good fit for StackOverflow. – JasonMArcher Jun 06 '14 at 23:27
  • 7
    @JasonMArcher: Only the very last paragraph is based on opinion (which yeah, maybe should be removed). The actual title of the question (which I take as the actual question itself) is answerable with facts. There *is* a reason the arrays were designed to work the way they work. – Cornstalks Jun 07 '14 at 00:05
  • 1
    This is the dumbest thing I've ever heard - there's no excuse for it. *facepalm*. Arrays should just be proper objects. I guess a workaround would be to just use Obj-C arrays? NSArray and NSMutableArray should still work. Implementing an "it might be a copy, but maybe not" array strategy is adding complication for no reason. – n13 Jun 07 '14 at 00:55
  • I guess a workaround is to wrap any array in proper objects when passing them around, rather than directly setting another variable to an array (which might end in a copy eventually and is therefore unpredictable). – n13 Jun 07 '14 at 03:45
  • They have copy on structure mutation but not on content mutation? Way to go making everything confusing. They should go one way or the other. (If `a[1]=42` also copied if shared, that would be entirely supportable, if perhaps a little trickier to make fast. It's just wrapping a mutable view over a basic immutable model.) – Donal Fellows Jun 07 '14 at 06:26
  • I would suggest that the last paragraph be adjusted slightly, to "This behavior seems ad hoc; is there some principle which determines when the array is copied and when it isn't?" – supercat Jun 07 '14 at 15:39
  • 1
    I wonder why `e[0..2] = [4, 5]` doesn't change the length?? – TaW Jun 07 '14 at 17:16
  • How I explain it in Java: Somehow Swift arrays can reassign itself, like `this = new A()` – Ming-Tang Jun 08 '14 at 02:13
  • This is usually called "Copy-on-Write" btw, just that it was implemented half-assed (a[1]=6 would have to create a copy too) – API-Beast Jun 08 '14 at 14:18
  • 7
    Yes, as API-Beast said, this is usually called "Copy-on-Half-Assed-Language-Design". – R. Martinho Fernandes Jun 10 '14 at 23:56
  • Java and C# do not have modifiable length for arrays (which is very positive imo). C realloc is similar to creating new array and copying into, so practically the same as Java/C#. Changing the length of an array may require changing the reference value which explains the odd behavior - i.e. accessing `e[0..2]` may result in a different value of pointer to the array held in `e`. To put it simply there is no way to implement sane arrays with modifiable length but point to pointer - i.e. wrapped arrays. Just to make sure: the design is truly dumb. It's an explanation why the result look that way. – bestsss Jun 15 '14 at 11:47
  • 1
    @R.MartinhoFernandes COW is a nice concept, I'm looking forward to see COHALD mentioned on Swift introductory books from now on. – NiñoScript Jun 28 '14 at 20:58
  • Currently (probably from Swift 2, if I had to guess) arrays *always* use COW, no ifs ands or buts. This question no longer applies because any modification -- basically, any `mutating` method -- to one instance of a shared array will trigger a copy of that array. – BallpointBen Jul 14 '17 at 18:17

10 Answers10

111

Note that array semantics and syntax was changed in Xcode beta 3 version (blog post), so the question no longer applies. The following answer applied to beta 2:


It's for performance reasons. Basically, they try to avoid copying arrays as long as they can (and claim "C-like performance"). To quote the language book:

For arrays, copying only takes place when you perform an action that has the potential to modify the length of the array. This includes appending, inserting, or removing items, or using a ranged subscript to replace a range of items in the array.

I agree that this is a bit confusing, but at least there is a clear and simple description of how it works.

That section also includes information on how to make sure an array is uniquely referenced, how to force-copy arrays, and how to check whether two arrays share storage.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Lukas
  • 3,093
  • 1
  • 17
  • 9
  • 61
    I find the fact that you have both unshare and copy a BIG red flag in design. – Cthutu Jun 06 '14 at 13:04
  • 9
    This is correct. An engineer described to me that for language design this is not desirable, and is something they hope to "fix" in upcoming updates to Swift. Vote with radars. – Erik Kerber Jun 06 '14 at 18:41
  • 2
    It's just something like copy-on-write (COW) in Linux child process memory management, right? Perhaps we can call it copy-on-length-alteration (COLA). I see this as a positive design. – justhalf Jun 11 '14 at 03:58
  • 3
    @justhalf I can predict a bunch of confused newbies coming to SO and asking why their arrays were / were not shared (just in a less clear way). – John Dvorak Jun 11 '14 at 05:27
  • Hmm, yeah, considering that Swift is from Apple, user friendliness is important. I come from Linux background, so what matters is efficiency :). So my point is, this is a positive design on a language focusing on speed. – justhalf Jun 11 '14 at 06:49
  • 11
    @justhalf: COW is a pessimization in the modern world anyway, and secondly, COW is an implementation-only technique, and this COLA stuff leads to totally random sharing and unsharing. – Puppy Jun 11 '14 at 13:54
  • 1
    @DeadMG: Apparently I still have lots to learn. People agree with you :) – justhalf Jun 13 '14 at 02:35
  • 2
    @justhalf, the design is sickening. Personally if I want a copy I'd like to tell explicitly to make a new one. On a flip note - regardless if the mass disagrees with you it still doesn't imply you're wrong per se. Yet, in this case the design makes no sense for me - it's just ridiculous. – bestsss Jun 15 '14 at 11:28
  • "If I want a copy I'd like to tell explicitly to make a new one" - I thought this is what is being achieved through COLA. I agree with your stand as well. Apprently what people dislike from COLA is the automatic copying on length alteration. I was focusing on the arrays being not copied when you don't change the length, which I just realized is already present in most programming languages. But on the copying when the length is changed, I guess Apple wants to guard against overflowing while maintaining efficiency. – justhalf Jun 15 '14 at 11:47
  • @justhalf, I mean I want an explicit method called `copy(newLength)` not an automatic one -- "maybe copy". I mentioned as comment at the original question - you cant have it both ways: sane, modifiable array length and no indirection ("performance"). C/Java/C#/ObjC can't modify the length of an array, if you want to do so - copy it. "Overflowing" is an most likely a programming error (just as integer overflows), handle it with an exception or create a type that allows that 'explicitly'. – bestsss Jun 15 '14 at 12:53
  • Yes, sorry for not making myself clear. I did say that it was an enlightenment for me that what people dislike from COLA is the "there is a chance to copy" in the "maybe copy" principle, while for me, I was actually saying that the "there is a chance of not copying" part of "maybe copy" is good, but then I said that I realized it's actually the most basic feature of most programming languages – justhalf Jun 15 '14 at 12:58
  • unshare and copy have both been removed from Swift in Xcode 6 beta 3. Yay! – Raymond Law Jul 08 '14 at 15:47
  • "blog post" link is dead, as is where it supposedly moved to – OrangeDog Feb 10 '17 at 14:04
  • "blog post" can be found here: https://owensd.io/2014/07/01/swift-arrays-are-fixed/ – schmidiii May 15 '17 at 14:04
25

From the official documentation of the Swift language:

Note that the array is not copied when you set a new value with subscript syntax, because setting a single value with subscript syntax does not have the potential to change the array’s length. However, if you append a new item to array, you do modify the array’s length. This prompts Swift to create a new copy of the array at the point that you append the new value. Henceforth, a is a separate, independent copy of the array.....

Read the whole section Assignment and Copy Behavior for Arrays in this documentation. You will find that when you do replace a range of items in the array then the array takes a copy of itself for all items.

iPatel
  • 46,010
  • 16
  • 115
  • 137
  • 4
    Thanks. I referred to that text vaguely in my question. But I showed an example where changing a subscript range didn't change the length and it still copied. So if you don't want a copy you have to change it one element at a time. – Cthutu Jun 06 '14 at 13:06
21

The behavior has changed with Xcode 6 beta 3. Arrays are no longer reference types and have a copy-on-write mechanism, meaning as soon as you change an array's content from one or the other variable, the array will be copied and only the one copy will be changed.


Old answer:

As others have pointed out, Swift tries to avoid copying arrays if possible, including when changing values for single indexes at a time.

If you want to be sure that an array variable (!) is unique, i.e. not shared with another variable, you can call the unshare method. This copies the array unless it already only has one reference. Of course you can also call the copy method, which will always make a copy, but unshare is preferred to make sure no other variable holds on to the same array.

var a = [1, 2, 3]
var b = a
b.unshare()
a[1] = 42
a               // [1, 42, 3]
b               // [1, 2, 3]
Pascal
  • 16,846
  • 4
  • 60
  • 69
12

The behavior is extremely similar to the Array.Resize method in .NET. To understand what's going on, it may be helpful to look at the history of the . token in C, C++, Java, C#, and Swift.

In C, a structure is nothing more than an aggregation of variables. Applying the . to a variable of structure type will access a variable stored within the structure. Pointers to objects do not hold aggregations of variables, but identify them. If one has a pointer which identifies a structure, the -> operator may be used to access a variable stored within the structure identified by the pointer.

In C++, structures and classes not only aggregate variables, but can also attach code to them. Using . to invoke a method will on a variable ask that method to act upon the contents of the variable itself; using -> on a variable which identifies an object will ask that method to act upon the object identified by the variable.

In Java, all custom variable types simply identify objects, and invoking a method upon a variable will tell the method what object is identified by the variable. Variables cannot hold any kind of composite data type directly, nor is there any means by which a method can access a variable upon which it is invoked. These restrictions, although semantically limiting, greatly simplify the runtime, and facilitate bytecode validation; such simplifications reduced the resource overhead of Java at a time when the market was sensitive to such issues, and thus helped it gain traction in the marketplace. They also meant that there was no need for a token equivalent to the . used in C or C++. Although Java could have used -> in the same way as C and C++, the creators opted to use single-character . since it was not needed for any other purpose.

In C# and other .NET languages, variables can either identify objects or hold composite data types directly. When used on a variable of a composite data type, . acts upon the contents of the variable; when used on a variable of reference type, . acts upon the object identified by it. For some kinds of operations, the semantic distinction isn't particularly important, but for others it is. The most problematical situations are those in which a composite data type's method which would modify the variable upon which it is invoked, is invoked on a read-only variable. If an attempt is made to invoke a method on a read-only value or variable, compilers will generally copy the variable, let the method act upon that, and discard the variable. This is generally safe with methods that only read the variable, but not safe with methods that write to it. Unfortunately, .does has not as yet have any means of indicating which methods can safely be used with such substitution and which can't.

In Swift, methods on aggregates can expressly indicate whether they will modify the variable upon which they are invoked, and the compiler will forbid the use of mutating methods upon read-only variables (rather than having them mutate temporary copies of the variable which will then get discarded). Because of this distinction, using the . token to call methods that modify the variables upon which they are invoked is much safer in Swift than in .NET. Unfortunately, the fact that the same . token is used for that purpose as to act upon an external object identified by a variable means the possibility for confusion remains.

If had a time machine and went back to the creation of C# and/or Swift, one could retroactively avoid much of the confusion surrounding such issues by having languages use the . and -> tokens in a fashion much closer to the C++ usage. Methods of both aggregates and reference types could use . to act upon the variable upon which they were invoked, and -> to act upon a value (for composites) or the thing identified thereby (for reference types). Neither language is designed that way, however.

In C#, the normal practice for a method to modify a variable upon which it is invoked is to pass the variable as a ref parameter to a method. Thus calling Array.Resize(ref someArray, 23); when someArray identifies an array of 20 elements will cause someArray to identify a new array of 23 elements, without affecting the original array. The use of ref makes clear that the method should be expected to modify the variable upon which it is invoked. In many cases, it's advantageous to be able to modify variables without having to use static methods; Swift addresses that means by using . syntax. The disadvantage is that it loses clarify as to what methods act upon variables and what methods act upon values.

supercat
  • 77,689
  • 9
  • 166
  • 211
5

What I've found is: The array will be a mutable copy of the referenced one if and only if the operation has the potential to change the array's length. In your last example, f[0..2] indexing with many, the operation has the potential to change its length (it might be that duplicates are not allowed), so it's getting copied.

var e = [1, 2, 3]
var f = e
e[0..2] = [4, 5]
e // 4,5,3
f // 1,2,3


var e1 = [1, 2, 3]
var f1 = e1

e1[0] = 4
e1[1] = 5

e1 //  - 4,5,3
f1 // - 4,5,3
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Kumar KL
  • 15,315
  • 9
  • 38
  • 60
  • 8
    "treated as length has changed" I can grasp that it would get copied iff the length is changed, but in combination with the quote above, I think this is a really worrisome "feature" and one which I think many people will get wrong – Joel Berger Jun 06 '14 at 12:16
  • 25
    Just because a language is new doesn't mean it's okay for it to contain glaring internal contradictions. – Lightness Races in Orbit Jun 06 '14 at 16:58
  • This has been "fixed" in beta 3, `var` arrays are now completely mutable and `let` arrays are completely immutable. – Pascal Jul 07 '14 at 22:10
5

To me this makes more sense if you first replace your constants with variables:

a[i] = 42            // (1)
e[i..j] = [4, 5]     // (2)

The first line never needs to change the size of a. In particular, it never needs to do any memory allocation. Regardless of the value of i, this is a lightweight operation. If you imagine that under the hood a is a pointer, it can be a constant pointer.

The second line may be much more complicated. Depending on the values of i and j, you may need to do memory management. If you imagine that e is a pointer that points to the contents of the array, you can no longer assume that it is a constant pointer; you may need to allocate a new block of memory, copy data from the old memory block to the new memory block, and change the pointer.

It seems that the language designers have tried to keep (1) as lightweight as possible. As (2) may involve copying anyway, they have resorted to the solution that it always acts as if you did a copy.

This is complicated, but I am happy that they did not make it even more complicated with e.g. special cases such as "if in (2) i and j are compile-time constants and the compiler can infer that the size of e is not going to change, then we do not copy".


Finally, based on my understanding of the design principles of the Swift language, I think the general rules are these:

  • Use constants (let) always everywhere by default, and there won't be any major surprises.
  • Use variables (var) only if it is absolutely necessary, and be vary careful in those cases, as there will be surprises [here: strange implicit copies of arrays in some but not all situations].
Jukka Suomela
  • 12,070
  • 6
  • 40
  • 46
4

Delphi's strings and arrays had the exact same "feature". When you looked at the implementation, it made sense.

Each variable is a pointer to dynamic memory. That memory contains a reference count followed by the data in the array. So you can easily change a value in the array without copying the whole array or changing any pointers. If you want to resize the array, you have to allocate more memory. In that case the current variable will point to the newly allocated memory. But you can't easily track down all of the other variables that pointed to the original array, so you leave them alone.

Of course, it wouldn't be hard to make a more consistent implementation. If you wanted all variables to see a resize, do this: Each variable is a pointer to a container stored in dynamic memory. The container holds exactly two things, a reference count and pointer to the actual array data. The array data is stored in a separate block of dynamic memory. Now there is only one pointer to the array data, so you can easily resize that, and all variables will see the change.

Trade-Ideas Philip
  • 1,067
  • 12
  • 21
4

A lot of Swift early adopters have complained about this error-prone array semantics and Chris Lattner has written that the array semantics had been revised to provide full value semantics ( Apple Developer link for those who have an account). We will have to wait at least for the next beta to see what this exactly means.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Gael
  • 459
  • 3
  • 18
0

I use .copy() for this.

    var a = [1, 2, 3]
    var b = a.copy()
     a[1] = 42 
Preetham
  • 125
  • 1
  • 11
0

Did anything change in arrays behavior in later Swift versions ? I just run your example:

var a = [1, 2, 3]
var b = a
a[1] = 42
a
b

And my results are [1, 42, 3] and [1, 2, 3]

jreft56
  • 199
  • 1
  • 12