This is a closely related to the following post, answered very well by Przemyslaw Szufel.
How can I run a simple parallel array assignment operation in Julia?
Given that I have a 40-core machine, I decided to follow Przemyslaw's advice and go with @distributed, rather than Threads, to perform the array assignment operations. This sped things up quite nicely.
My algorithm's only slight difference with the above user's situation is that I have nested loops. Of course, I could always vectorize the array I'm performing the assignment operation on, but that would complicate my code. Should I simply include @sync @distributed before the outermost loop, and leave it at that? Or would I need to put additional macros before the (two, in my case) inner loops to maximize the benefits of parallelization?