How to solve the matrix equation PAP^T=I in R or Python using code

Question

x=matrix(c(rnorm(10,1,3), rnorm(10,7,2), rnorm(10,15,4)), byrow=FALSE, nrow=10)
xc=x
for (k in 1:3) {
  xc[,k]=xc[,k]-mean(xc[,k])
}
A=var(xc)

out

> A
          [,1]      [,2]      [,3]
[1,] 10.166746 -2.619763 -6.717475
[2,] -2.619763  3.291888  3.052898
[3,] -6.717475  3.052898 22.313351

I want to solve PAP^T=I, where P is 3x3, P^T is the transpose of P, and I is 3x3 identity matrix. solve(A) solves PA=I, but I want PAP^T=I. I couldn't find an answer and multiplying the matrices out will take too long. Does anyone know how to do this in R or Python? I also don't want to use the fact that P can be A^(-1/2) either. A solution via code would be nice.

library(stringr)
A=matrix(c(letters[1:4], letters[6:8], letters[10:11]), nrow=3)
mult=function(A, B) {
  AB=matrix(nrow=3, ncol=3)
  for (r in 1:3) {
    for (c in 1:3) {
      AB[r,c]=str_c("(", A[r,1], ")*(", B[1,c], ")+(", A[r,2], ")*(", B[2,c], ")+(", A[r,3], ")*(", B[3,c], ")")
    }
  }
  return(AB)
}
AB=mult(A,B)
ABAT=mult(AB, t(A))

seteq=function(A, B) {
  E=matrix(nrow=3, ncol=3)
  for (r in 1:3) {
    for (c in 1:3) {
      E[r,c]=str_c(A[r,c], "=", B[r, c])
    }
  }
  dim(E)=c(9,1)
  return(E[,1])
}
Eq=seteq(ABAT, diag(rep(1, 3)))
res=""
for (i in 1:9) {
  res=str_c(res, Eq[i])
  if (i!=9) res=str_c(res, ",")
}
res

giving me the equations

((a)*(18.9052758456522)+(d)*(0.834836569187845)+(h)*(9.76575358263375))*(a)+((a)*(0.834836569187845)+(d)*(0.555656932954595)+(h)*(1.03585433030833))*(d)+((a)*(9.76575358263375)+(d)*(1.03585433030833)+(h)*(6.84209765783974))*(h)=1,((b)*(18.9052758456522)+(f)*(0.834836569187845)+(j)*(9.76575358263375))*(a)+((b)*(0.834836569187845)+(f)*(0.555656932954595)+(j)*(1.03585433030833))*(d)+((b)*(9.76575358263375)+(f)*(1.03585433030833)+(j)*(6.84209765783974))*(h)=0,((c)*(18.9052758456522)+(g)*(0.834836569187845)+(k)*(9.76575358263375))*(a)+((c)*(0.834836569187845)+(g)*(0.555656932954595)+(k)*(1.03585433030833))*(d)+((c)*(9.76575358263375)+(g)*(1.03585433030833)+(k)*(6.84209765783974))*(h)=0,((a)*(18.9052758456522)+(d)*(0.834836569187845)+(h)*(9.76575358263375))*(b)+((a)*(0.834836569187845)+(d)*(0.555656932954595)+(h)*(1.03585433030833))*(f)+((a)*(9.76575358263375)+(d)*(1.03585433030833)+(h)*(6.84209765783974))*(j)=0,((b)*(18.9052758456522)+(f)*(0.834836569187845)+(j)*(9.76575358263375))*(b)+((b)*(0.834836569187845)+(f)*(0.555656932954595)+(j)*(1.03585433030833))*(f)+((b)*(9.76575358263375)+(f)*(1.03585433030833)+(j)*(6.84209765783974))*(j)=1,((c)*(18.9052758456522)+(g)*(0.834836569187845)+(k)*(9.76575358263375))*(b)+((c)*(0.834836569187845)+(g)*(0.555656932954595)+(k)*(1.03585433030833))*(f)+((c)*(9.76575358263375)+(g)*(1.03585433030833)+(k)*(6.84209765783974))*(j)=0,((a)*(18.9052758456522)+(d)*(0.834836569187845)+(h)*(9.76575358263375))*(c)+((a)*(0.834836569187845)+(d)*(0.555656932954595)+(h)*(1.03585433030833))*(g)+((a)*(9.76575358263375)+(d)*(1.03585433030833)+(h)*(6.84209765783974))*(k)=0,((b)*(18.9052758456522)+(f)*(0.834836569187845)+(j)*(9.76575358263375))*(c)+((b)*(0.834836569187845)+(f)*(0.555656932954595)+(j)*(1.03585433030833))*(g)+((b)*(9.76575358263375)+(f)*(1.03585433030833)+(j)*(6.84209765783974))*(k)=0,((c)*(18.9052758456522)+(g)*(0.834836569187845)+(k)*(9.76575358263375))*(c)+((c)*(0.834836569187845)+(g)*(0.555656932954595)+(k)*(1.03585433030833))*(g)+((c)*(9.76575358263375)+(g)*(1.03585433030833)+(k)*(6.84209765783974))*(k)=1

but Wolfram Alpha did not recognize this input.

`solve(P %*% A, solve(t(P)))`. While your question is good, this is a duplicate of many questions [like this one](https://stackoverflow.com/q/8145694/10782538) or [this one](https://stackoverflow.com/q/53093918/10782538) or [this one](https://stackoverflow.com/q/6789927/10782538) or maybe [this one](https://stackoverflow.com/q/36354807/10782538). "Solve linear equations {language} stackoverflow" in a google search will be your savior. — Oliver, Feb 21 '21 at 19:19
Does this answer your question? [Solving simultaneous equations with R](https://stackoverflow.com/questions/8145694/solving-simultaneous-equations-with-r) — Oliver, Feb 21 '21 at 19:21
I don't think this completely answers my question because the first one is PA=t(P)^(-1). But I want something that tries to solve it when t(P) may not be invertible. @Oliver — Vons, Feb 21 '21 at 19:37
Grothendieck's answer solves this. An alternative is to use the generalized inverse in place of `solve(t(P))`. Maybe I didn't get the question quite. I assumed your parameters were in `A` not `P`. In this case it would of course be insufficient to look at the potential duplicate answers. — Oliver, Feb 21 '21 at 20:56
This looks very much like a degenerated algebraic Riccati equation. See https://en.wikipedia.org/wiki/Algebraic_Riccati_equation Thus, it might be that some function like `care(...)` be available to solve this problem in some R or Python package... — aka.nice, Feb 22 '21 at 09:27
`P=inv(chol(A))` is indeed a trivial solution. Since A is symmetric, there are only 6 different equations, for 9 unknowns. You need to set additionnal constraints like P is symmetric or something. — aka.nice, Feb 22 '21 at 10:12

G. Grothendieck · Accepted Answer · 2021-02-22T10:59:22.007

Since there was some confusion over this, to clarify the problem is to find P given the input A in the Note at the end.

1) eigen Assuming that A is a symmetric positive definite matrix, which in the question it is:

P <- with(eigen(A), diag(1/sqrt(values)) %*% t(vectors))

# check
P %*% A %*% t(P)
## [1,]  1.000000e+00 -6.661338e-16 2.775558e-17
## [2,] -6.661338e-16  1.000000e+00 3.469447e-16
## [3,]  2.775558e-17  1.994932e-16 1.000000e+00

2) optim Another approach is to solve the 6 equations numerically. Note that the parameterization constrains P to be symmetric reducing the number of parameters to 6. A full 3x3=9 parameters would likely lead to numerical problems because, as noted in the comments, if P is a solution then HP is too for any orthogonal H.

# input is upper triangle of P incl diagonal and output is P
p2P <- function(p) {
   P <- diag(3)
   P[upper.tri(P, diag = TRUE)] <- p
   P + t(P) - diag(diag(P))
}

# sum of squares of difference between PAP' and diag(3)
f <- function(p) {
   P <- p2P(p)
   sum(c(P %*% A %*% t(P) - diag(3))^2)
}

res <- optim(1:6, f, method = "BFGS")
str(res)
## List of 5
##  $ par        : num [1:6] -0.2605 0.2573 0.5745 -0.0776 -0.027 ...
##  $ value      : num 4.69e-10
##  $ counts     : Named int [1:2] 167 49
##   ..- attr(*, "names")= chr [1:2] "function" "gradient"
##  $ convergence: int 0
##  $ message    : NULL

# check that PAP' is diag(3)
P <- p2P(res$par)
P %*% A %*% t(P)
##               [,1]          [,2]         [,3]
## [1,]  9.999948e-01 -8.654260e-08 8.333054e-06
## [2,] -8.654260e-08  9.999955e-01 3.266114e-07
## [3,]  8.333054e-06  3.266114e-07 9.999832e-01

3) nls We can alternately use nls to solve the equations. We use p2P from above but we don't need f. Note that, again, P is symmetric.

result2 <- nls(c(diag(3)) ~ c(p2P(p) %*% A %*% t(p2P(p))), 
  start = list(p = 1:6), algorithm = "port")
result2
## Nonlinear regression model
##   model: c(diag(3)) ~ c(p2P(p) %*% A %*% t(p2P(p)))
##    data: parent.frame()
##       p1       p2       p3       p4       p5       p6 
##  0.26049 -0.25733 -0.57448  0.07757  0.02702  0.22665 
##  residual sum-of-squares: 1.354e-22
##
## Algorithm "port", convergence message: absolute function convergence (6)

# check that PAP' equals diag(3)
P <- p2P(coef(result2))
P %*% A %*% t(P)
##              [,1]          [,2]          [,3]
## [1,] 1.000000e+00  7.954436e-13  5.084821e-14
## [2,] 7.954401e-13  1.000000e+00 -1.869643e-12
## [3,] 5.073719e-14 -1.869574e-12  1.000000e+00

4) nls-2 This also uses nls but uses a different parameterization of P, namely a diagonal matrix (3 parameters) times the exponential of a skew symmetric matrix (3 parameters). Note that the exponential of a skew symmetric matrix is orthogonal so the parameterization is constraining P to be a diagonal times an orthogonal.

library(expm)

p2P4 <- function(p) {
  X <- matrix(0, 3, 3)
  X[upper.tri(X)] <- p[1:3]
  diag(p[4:6]) %*% expm(X - t(X)) 
}

res4 <- nls(c(diag(3)) ~ c(p2P4(p) %*% A %*% t(p2P4(p))), start = list(p = 1:6),
  algorithm = "port")
res4
## Nonlinear regression model
##   model: c(diag(3)) ~ c(p2P4(p) %*% A %*% t(p2P4(p)))
##    data: parent.frame()
##     p1     p2     p3     p4     p5     p6 
## 3.3142 1.9743 1.6253 0.6500 0.1963 0.3663 
##  residual sum-of-squares: 2.863e-30
##
## Algorithm "port", convergence message: X-convergence (3)

# check that PAP' equals diag(3)
P <- p2P4(coef(res4))
P %*% A %*% t(P)
##               [,1]          [,2]          [,3]
## [1,]  1.000000e+00  9.367507e-16 -2.827599e-16
## [2,]  9.159340e-16  1.000000e+00 -6.661338e-16
## [3,] -2.289835e-16 -6.106227e-16  1.000000e+00

Note

We used this for A:

Lines <- "
10.166746 -2.619763 -6.717475
-2.619763  3.291888  3.052898
-6.717475  3.052898 22.313351"
A <- as.matrix(read.table(text = Lines))

score 1 · Answer 2 · answered Feb 22 '21 at 10:47

As you have noticed, if you decompose A in algebraic "square root"

A = L * L^T

Then the problem becomes:

(P*L) * (P*L)^T = Id

That means that P*L is a unitary matrix https://en.wikipedia.org/wiki/Unitary_matrix.

The solution are thus formed with any unitary matrix U, (U * U^T=Id)

P=inv(L)*U

There are infinitely many solutions.

You explicitely states that you don't want the trivial solution P=inv(L) with U=Id, but you do not tell why, and you do not provide additional constraints on P...

Note that the first solution in the good answer of @G. Grothendieck is exactly this: reduce A in diagonal form D (eigenvalues) with eigenvectors V

inv(V)*A*V = D
D = sqrt(D)*sqrt(D)
P = inv(sqrt(D))*V

How to solve the matrix equation PAP^T=I in R or Python using code

2 Answers2

Note