I managed to implement an in-place solution though index manipulations for naive Divide & Conquer algorithm for matrix multiplication which requires 8 recursive calls in each recurrence. However, when trying to implement Strassen algorithm, I couldn't find a way to do it in-place. Instead, I have to malloc 19 sub matrices for the 7 recursive calls while using C to program.
How to implement Strassen algorithm in-place? Or is it possible?