I wouldn't normally keep beating a dead horse, but it happens that my non-vectorized approach to solving your other question, has some merit when things get big. Because I was actually filling the coefficient matrix one item at a time, it is very easy to translate into COO sparse matrix format, which can efficiently be transformed to CSC and solved. The following does it:
import scipy.sparse
def sps_solve(n) :
# Solution vector is [N[0], N[1], ..., N[n - 2], M[1], M[2], ..., M[n - 1]]
n_pos = lambda p : p
m_pos = lambda p : p + n - 2
data = []
row = []
col = []
# p = 0
# n * N[0] + (1 - n) * M[n-1] = n
row += [n_pos(0), n_pos(0)]
col += [n_pos(0), m_pos(n - 1)]
data += [n, 1 - n]
for p in xrange(1, n - 1) :
# n * M[p] + (1 + p - n) * M[n - 1] - 2 * N[p - 1] +
# (1 - p) * M[p - 1] = n
row += [m_pos(p)] * (4 if p > 1 else 3)
col += ([m_pos(p), m_pos(n - 1), n_pos(p - 1)] +
([m_pos(p - 1)] if p > 1 else []))
data += [n, 1 + p - n , -2] + ([1 - p] if p > 1 else [])
# n * N[p] + (1 + p -n) * M[n - 1] - p * N[p - 1] = n
row += [n_pos(p)] * 3
col += [n_pos(p), m_pos(n - 1), n_pos(p - 1)]
data += [n, 1 + p - n, -p]
if n > 2 :
# p = n - 1
# n * M[n - 1] - 2 * N[n - 2] + (2 - n) * M[n - 2] = n
row += [m_pos(n-1)] * 3
col += [m_pos(n - 1), n_pos(n - 2), m_pos(n - 2)]
data += [n, -2, 2 - n]
else :
# p = 1
# n * M[1] - 2 * N[0] = n
row += [m_pos(n - 1)] * 2
col += [m_pos(n - 1), n_pos(n - 2)]
data += [n, -2]
coeff_mat = scipy.sparse.coo_matrix((data, (row, col))).tocsc()
return scipy.sparse.linalg.spsolve(coeff_mat,
np.ones(2 * (n - 1)) * n)
It is of course much more verbose than building it from vectorized blocks, as TheodorosZelleke does, but an interesting thing happens when you time both approaches:

First, and this is (very) nice, time is scaling linearly in both solutions, as one would expect from using the sparse approach. But the solution I gave in this answer is always faster, more so for larger n
s. Just for the fun of it, I also timed TheodorosZelleke's dense approach from the other question, which gives this nice graph showing the different scaling of both types of solutions, and how very early, somewhere around n = 75
, the solution here should be your choice:

I don't know enough about scipy.sparse
to really figure out why the differences between the two sparse approaches, although I suspect heavily of the use of LIL format sparse matrices. There may be some very marginal performance gain, although a lot of compactness in the code, by turning TheodorosZelleke's answer into COO format. But that is left as an exercise for the OP!