I don't think there is a more efficient solution than O(N²) if you don't add any other constraint. In other words, there is no other way to decide you have found the maximum-sum subarray but to explore all the other subarrays.
Thus the least-complex solution comprises O(N²/2) which is the overall number of contiguous subarrays of an array of given length N.
Personally I would implement this with the dynamic programming approach. The idea is having a wedge of partial results, and use them to build the current sums of the subarrays (in place of computing the whole sum through). Anyhow this gives "only" a constant speedup, thus the complexity is O(N²/2)~O(N²).
The following is pseudocode - sorry for not speaking Lua
// here we place temporary results, row by row alternating in 0 or 1
int[2][N] sum_array_buffer
// stores the start of the max subarray
int[N] max_subarray_start
// stores the value
int[N] max_subarray_value
array = {7, 1, 3, 1, 4, 5, 1, 3, 6}
// we initialize the buffer with the array (ideally 1-length subarrays)
sum_array_buffer[1] = array
// the length of subarrays - we can also start from 1 if considered
for k = 1 ; k <= (N); ++k:
// the starting position fo the sub-array
for j = 0; j < (N-k+1); ++j:
sum_array_buffer[k%2][j] = sum_array_buffer[(k+1)%2][j] + array[j+k-1]
if j == 0 || sum_array_buffer[k%2][j] > max_subarray_value[k]:
max_subarray_value = sum_array_buffer[k%2][j]
max_subarray_start[k] = j
for k = 1 ; k <= (N); ++k:
print(k, max_subarray_value[k])
Graphycally:
