A Appendix
A.0.1 Proof of (2.15)
Witten and Tibshirani (2009) and Price, Geyer, and Rothman (2015) showed that the following optimization problem can be solved in closed-form. We outline their steps here.
\[\begin{equation} \Omega^{k + 1} = \arg\min_{\Omega \in \mathbb{S}_{+}^{p}}\left\{ tr\left(S\Omega\right) - \log\left|\Omega\right| + tr\left[\Lambda^{k}\left(\Omega - Z^{k}\right)\right] + \frac{\rho}{2}\left\| \Omega - Z^{k} \right\|_{F}^{2} \right\} \tag{A.1}\notag \end{equation}\]
First start by taking the gradient with respect to \(\Omega\):
\[\begin{equation} \begin{split} \nabla_{\Omega}&\left\{ tr\left(S\Omega\right) - \log\left|\Omega\right| + tr\left[\Lambda^{k}\left(\Omega - Z^{k}\right)\right] + \frac{\rho}{2}\left\| \Omega - Z^{k} \right\|_{F}^{2} \right\} \\ &= S - \Omega^{-1} + \Lambda^{k} + \rho\left( \Omega - Z^{k} \right) \end{split} \tag{A.2}\notag \end{equation}\]
Note that because all of the variables are symmetric, we can ignore the symmetric constraint when deriving the gradient. Next set the gradient equal to zero and decompose \(\Omega^{k + 1} = VDV'\) using spectral decomposition so that \(D\) is a diagonal matrix with diagonal elements equal to the eigen values of \(\Omega^{k + 1}\) and \(V\) is the matrix with corresponding eigen vectors as columns so that
\[\begin{equation} S + \Lambda^{k} - \rho Z^{k} = (\Omega^{k + 1})^{-1} - \rho \Omega^{k + 1} = VD^{-1}V' - \rho VDV' = V\left(D^{-1} - \rho D\right)V'\notag \end{equation}\]
This equivalence implies that
\[\begin{equation} \phi_{j}\left( S + \Lambda^{k} - \rho Z^{k} \right) = \frac{1}{\phi_{j}(\Omega^{k + 1})} - \rho\phi_{j}(\Omega^{k + 1})\notag \end{equation}\]
where \(\phi_{j}(\cdot)\) is the \(j\)th eigen value.
\[\begin{equation} \begin{split} &\Rightarrow \rho\phi_{j}^{2}(\Omega^{k + 1}) + \phi_{j}\left( S + \Lambda^{k} - \rho Z^{k} \right)\phi_{j}(\Omega^{k + 1}) - 1 = 0 \\ &\Rightarrow \phi_{j}(\Omega^{k + 1}) = \frac{-\phi_{j}(S + \Lambda^{k} - \rho Z^{k}) \pm \sqrt{\phi_{j}^{2}(S + \Lambda^{k} - \rho Z^{k}) + 4\rho}}{2\rho} \end{split} \notag \end{equation}\]
In summary, if we decompose \(S + \Lambda^{k} - \rho Z^{k} = VQV'\) then
\[\begin{equation} \Omega^{k + 1} = \frac{1}{2\rho}V\left[ -Q + (Q^{2} + 4\rho I_{p})^{1/2}\right] V' \tag{A.3} \end{equation}\]
A.0.2 Proof of (2.16)
Solve the optimization problem
\[\begin{equation} Z^{k + 1} = \arg\min_{Z \in \mathbb{S}^{p}}\left\{ \lambda\left[ \frac{1 - \alpha}{2}\left\| Z \right\|_{F}^{2} + \alpha\left\| Z \right\|_{1} \right] + tr\left[\Lambda^{k}\left(\Omega^{k + 1} - Z\right)\right] + \frac{\rho}{2}\left\| \Omega^{k + 1} - Z \right\|_{F}^{2} \right\} \tag{A.4}\notag \end{equation}\]
Start by taking the subgradient with respect to \(Z\):
\[\begin{equation} \begin{split} \partial&\left\{ \lambda\left[ \frac{1 - \alpha}{2}\left\| Z \right\|_{F}^{2} + \alpha\left\| Z \right\|_{1} \right] + tr\left[\Lambda^{k}\left(\Omega^{k + 1} - Z\right)\right] + \frac{\rho}{2}\left\| \Omega^{k + 1} - Z \right\|_{F}^{2} \right\} \\ &= \partial\left\{ \lambda\left[ \frac{1 - \alpha}{2}\left\| Z \right\|_{F}^{2} + \alpha\left\| Z \right\|_{1} \right] \right\} + \nabla_{\Omega}\left\{ tr\left[\Lambda^{k}\left(\Omega^{k + 1} - Z\right)\right] + \frac{\rho}{2}\left\| \Omega^{k + 1} - Z \right\|_{F}^{2} \right\} \\ &= \lambda(1 - \alpha)Z + \mbox{sign}(Z)\lambda\alpha - \Lambda^{k} - \rho\left( \Omega^{k + 1} - Z \right) \end{split} \tag{A.5}\notag \end{equation}\]
where sign is the elementwise sign operator. By setting the gradient/sub-differential equal to zero, we have that
\[\begin{equation} Z_{ij}^{k + 1} = \frac{1}{\lambda(1 - \alpha) + \rho}\left( \rho \Omega_{ij}^{k + 1} + \Lambda_{ij}^{k} - \mbox{sign}(Z_{ij}^{k + 1})\lambda\alpha \right) \tag{A.6}\notag \end{equation}\]
for all \(i = 1,..., p\) and \(j = 1,..., p\). We observe two scenarios:
- If \(Z_{ij}^{k + 1} > 0\) then
\[\begin{equation} \rho\Omega_{ij}^{k + 1} + \Lambda_{ij}^{k} > \lambda\alpha \notag \end{equation}\]
- If \(Z_{ij}^{k + 1} < 0\) then
\[\begin{equation} \rho\Omega_{ij}^{k + 1} + \Lambda_{ij}^{k} < -\lambda\alpha \notag \end{equation}\]
This implies that \(\mbox{sign}(Z_{ij}) = \mbox{sign}(\rho\Omega_{ij}^{k + 1} + \Lambda_{ij}^{k})\). Putting all the pieces together, we arrive at
\[\begin{equation} \begin{split} Z_{ij}^{k + 1} &= \frac{1}{\lambda(1 - \alpha) + \rho}\mbox{sign}\left(\rho\Omega_{ij}^{k + 1} + \Lambda_{ij}^{k}\right)\left( \left| \rho\Omega_{ij}^{k + 1} + \Lambda_{ij}^{k} \right| - \lambda\alpha \right)_{+} \\ &= \frac{1}{\lambda(1 - \alpha) + \rho}\mbox{soft}\left(\rho\Omega_{ij}^{k + 1} + \Lambda_{ij}^{k}, \lambda\alpha\right) \end{split} \tag{A.7}\notag \end{equation}\]
where soft is the soft-thresholding function.
A.0.3 Proof of the dual residual
Here we want to show that \(0 \in \rho\left(Z^{k + 1} - Z^{k}\right)\) which is a direct result of the fact that \(\Omega^{k + 1}\) is the minimizer of the augmented lagrangian and \(0 \in f\left(\Omega^{k + 1}\right) + \Lambda^{k + 1}\).
\[\begin{equation} \begin{split} 0 &\in \partial \left\{ f\left(\Omega^{k + 1}\right) + tr\left[ \Lambda^{k}\left( \Omega^{k + 1} - Z^{k} \right) \right] + \frac{\rho}{2}\left\| \Omega^{k + 1} - Z^{k} \right\|_{F}^{2} \right\} \\ &= \partial f\left(\Omega^{k + 1}\right) + \Lambda^{k} + \rho\left(\Omega^{k + 1} - Z^{k}\right) \\ &= \partial f\left(\Omega^{k + 1}\right) + \Lambda^{k} + \rho\left(\Omega^{k + 1} + Z^{k + 1} - Z^{k + 1} - Z^{k}\right) \\ &= \partial f\left(\Omega^{k + 1}\right) + \Lambda^{k} + \rho\left(\Omega^{k + 1} - Z^{k + 1}\right) + \rho\left(Z^{k + 1} - Z^{k}\right) \\ &= \partial f\left(\Omega^{k + 1}\right) + \Lambda^{k + 1} + \rho\left(Z^{k + 1} - Z^{k}\right) \\ \Rightarrow 0 &\in \rho\left( Z^{k + 1} - Z^{k} \right) \end{split} \tag{A.8}\notag \end{equation}\]
A.0.4 Proof of second dual optimality condition
Here we use the primal optimality condition \(\Omega^{k + 1} - Z^{k + 1} = 0\) to show that the second dual optimality condition \(0 \in \partial g\left(Z^{k + 1}\right) - \Lambda^{k + 1}\) is always satisfied.
\[\begin{equation} \begin{split} 0 &\in \partial \left\{ g\left(Z^{k + 1}\right) + tr\left[ \Lambda^{k}\left( \Omega^{k + 1} - Z^{k + 1} \right) \right] + \rho\left\| \Omega^{k + 1} - Z^{k + 1} \right\|_{F}^{2} \right\} \\ &= \partial g\left(Z^{k + 1}\right) - \Lambda^{k} - \rho\left(\Omega^{k + 1} - Z^{k + 1}\right) \\ &= \partial g\left(Z^{k + 1}\right) - \Lambda^{k + 1} \\ \end{split} \tag{A.9}\notag \end{equation}\]
A.0.5 Explanation of majorizing function (3.6)
To see why this particular function was used, consider the Taylor’s expansion of \(\rho\left\|A\Omega B - Z^{k} - C\right\|_{F}^{2}/2\):
\[\begin{equation} \begin{split} \frac{\rho}{2}\left\| A\Omega B - Z^{k} - C \right\|_{F}^{2} &\approx \frac{\rho}{2}\left\| A\Omega^{k} B - Z^{k} - C \right\|_{F}^{2} \\ &+ \frac{\rho}{2}vec\left( \Omega - \Omega^{k}\right)'\left(A'A \otimes BB'\right)vec\left(\Omega - \Omega^{k}\right) \\ &+ \rho vec\left(\Omega - \Omega^{k}\right)'vec\left(BB'\Omega^{k}A'A - B(Z^{k})'A - BC'A \right) \end{split} \tag{A.10}\notag \end{equation}\]
Note that the gradient and hessian, respectively, are
\[\begin{equation} \nabla_{\Omega}\left\{ \frac{\rho}{2}\left\|A\Omega B - Z - C\right\|_{F}^{2} \right\} = \rho BB'\Omega A'A - \rho BZ'A - \rho BC'A \tag{A.11}\notag \end{equation}\]
\[\begin{equation} \nabla_{\Omega}^{2}\left\{ \frac{\rho}{2}\left\|A\Omega B - Z - C \right\|_{F}^{2} \right\} = \rho\left(A'A \otimes BB' \right) \tag{A.12}\notag \end{equation}\]
This implies that
\[\begin{equation} \begin{split} \frac{\rho}{2}\left\| A\Omega B - Z^{k} - C \right\|_{F}^{2} &+ \frac{\rho}{2}vec\left(\Omega - \Omega^{k} \right)'Q\left(\Omega - \Omega^{k} \right) \\ &\approx \frac{\rho}{2}\left\| A\Omega^{k} B - Z^{k} - C \right\|_{F}^{2} + \frac{\rho}{2}vec\left(\Omega - \Omega^{k} \right)'Q\left(\Omega - \Omega^{k} \right) \\ &+ \frac{\rho}{2}vec\left( \Omega - \Omega^{k}\right)'\left(A'A \otimes BB'\right)vec\left(\Omega - \Omega^{k}\right) \\ &+ \rho vec\left(\Omega - \Omega^{k}\right)' vec\left(BB'\Omega^{k}A'A - B(Z^{k})'A - BC'A \right) \\ &= \frac{\rho}{2}\left\| A\Omega^{k} B - Z^{k} - C \right\|_{F}^{2} + \frac{\rho\tau}{2}\left\|\Omega - \Omega^{k}\right\|_{F}^{2} \\ &+ \rho tr\left[\left(\Omega - \Omega^{k}\right)\left(BB'\Omega^{k}A'A - B(Z^{k})'A - BC'A \right)\right] \end{split} \tag{A.13}\notag \end{equation}\]
Let us now plug in this equality into our optimization problem that includes the augmented lagrangian:
\[\begin{equation} \begin{split} \hat{\Omega}^{k + 1} &= \arg\min_{\Omega \in \mathbb{S}_{+}^{p}}\tilde{L}_{\rho}(\Omega, Z^{k}, \Lambda^{k}) \\ &= \arg\min_{\Omega \in \mathbb{S}_{+}^{p}}\left\{\begin{matrix} tr\left(S\Omega\right) - \log\left|\Omega\right| + tr\left[(\Lambda^{k})'(A\Omega B - Z^{k} - C) \right] + \rho\left\|A\Omega B - Z^{k} - C \right\|_{F}^{2}/2 \end{matrix}\right. \\ &+ \left.\begin{matrix} vec\left(\Omega - \Omega^{k}\right)'\rho Q\left(\Omega - \Omega^{k}\right)/2 \end{matrix}\right\} \\ &= \arg\min_{\Omega \in \mathbb{S}_{+}^{p}}\left\{\begin{matrix} tr\left(S\Omega\right) - \log\left|\Omega\right| + tr\left[(\Lambda^{k})'(A\Omega B - Z^{k} - C) \right] + \rho\left\|A\Omega^{k} B - Z^{k} - C \right\|_{F}^{2}/2 \end{matrix}\right. \\ &+ \left.\begin{matrix} \rho\tau\left\|\Omega - \Omega^{k}\right\|_{F}^{2}/2 + tr\left[\rho\left(\Omega - \Omega^{k}\right)\left(BB'\Omega^{k}A'A - B(Z^{k})'A - BC'A \right)\right] \end{matrix}\right\} \\ &= \arg\min_{\Omega \in \mathbb{S}_{+}^{p}}\left\{\begin{matrix} tr\left[\left(S + \rho A'(A\Omega^{k}B - Z^{k} - C + \Lambda^{k}/\rho)B' \right)\Omega\right] \end{matrix}\right. \\ &- \left.\begin{matrix} \log\left|\Omega\right| + \rho\tau\left\|\Omega - \Omega^{k}\right\|_{F}^{2}/2 \end{matrix}\right\} \\ &= \arg\min_{\Omega \in \mathbb{S}_{+}^{p}}\left\{ tr\left[\left(S + G^{k} \right)\Omega\right] - \log\left|\Omega\right| + \rho\tau\left\|\Omega - \Omega^{k}\right\|_{F}^{2}/2 \right\} \\ \end{split} \tag{A.14}\notag \end{equation}\]
where \(G^{k} = \rho A'(A\Omega^{k}B - Z^{k} - C + \Lambda^{k}/\rho)B'\).
A.0.6 Proof of (3.10)
We show that the following optimization problem can be solved in closed-form similar to (A.3).
\[\begin{equation} \Omega^{k + 1} = \arg\min_{\Omega \in \mathbb{S}_{+}^{p}}\left\{tr\left[\left(S + G^{k}\right)\Omega\right] - \log\left|\Omega\right| + \frac{\rho\tau}{2}\left\|\Omega - \Omega^{k}\right\|_{F}^{2} \right\} \tag{A.15}\notag \end{equation}\]
First start by taking the gradient with respect to \(\Omega\):
\[\begin{equation} \begin{split} &\nabla_{\Omega}\left\{tr\left[\left(S + G^{k}\right)\Omega\right] - \log\left|\Omega\right| + \frac{\rho\tau}{2}\left\|\Omega - \Omega^{k}\right\|_{F}^{2} \right\} \\ &= 2S - S\circ I_{p} + G^{k} + (G^{k})' - G^{k}\circ I_{p} - 2\Omega^{-1} + \Omega^{-1}\circ I_{p} \\ &+ \frac{\rho\tau}{2}\left[2\Omega - 2(\Omega^{k})' + 2\Omega' - 2\Omega^{k} - 2(\Omega - \Omega^{k})'\circ I_{p} \right] \end{split} \tag{A.16}\notag \end{equation}\]
Note that we need to honor the symmetric constraint given by \(\Omega\). By setting the gradient equal to zero and multiplying all off-diagonal elements by \(1/2\), this simplifies to
\[\begin{equation} S + \frac{1}{2}\left(G^{k} + (G^{k})'\right) - \rho\tau\Omega^{k} = (\Omega^{k + 1})^{-1} - \rho\tau\Omega^{k + 1} \notag \end{equation}\]
We can then decompose using spectral decomposition \(\Omega^{k + 1} = VDV'\) where \(D\) is a diagonal matrix with diagonal elements equal to the eigen values of \(\Omega^{k + 1}\) and \(V\) is the matrix with corresponding eigen vectors as columns.
\[\begin{equation} S + \frac{1}{2}\left(G^{k} + (G^{k})'\right) - \rho\tau\Omega^{k} = VD^{-1}V' - \rho\tau VDV' = V\left(D^{-1} - \rho\tau D\right)V' \notag \end{equation}\]
This equivalence implies that
\[\begin{equation} \phi_{j}\left( D^{k} \right) = \frac{1}{\phi_{j}(\Omega^{k + 1})} - \rho\tau\phi_{j}(\Omega^{k + 1}) \notag \end{equation}\]
where \(\phi_{j}(\cdot)\) is the \(j\)th eigen value and \(D^{k} = S + \left(G^{k} + (G^{k})'\right)/2 - \rho\tau\Omega^{k}\). Therefore
\[\begin{equation} \begin{split} &\Rightarrow \rho\tau\phi_{j}^{2}(\Omega^{k + 1}) + \phi_{j}\left( D^{k} \right)\phi_{j}(\Omega^{k + 1}) - 1 = 0 \\ &\Rightarrow \phi_{j}(\Omega^{k + 1}) = \frac{-\phi_{j}(D^{k}) \pm \sqrt{\phi_{j}^{2}(D^{k}) + 4\rho\tau}}{2\rho\tau} \end{split} \notag \end{equation}\]
In summary, if we decompose \(S + \left(G^{k} + (G^{k})'\right)/2 - \rho\tau\Omega^{k} = VQV'\) then
\[\begin{equation} \Omega^{k + 1} = \frac{1}{2\rho\tau}V\left[ -Q + (Q^{2} + 4\rho\tau I_{p})^{1/2}\right] V' \tag{A.17}\notag \end{equation}\]
A.0.7 Proof of (3.11)
Solve the optimization problem
\[\begin{equation} Z^{k + 1} = \arg\min_{Z \in \mathbb{R}^{n \times r}}\left\{ \lambda\left\| Z \right\|_{1} + tr\left[(\Lambda^{k})'\left(A\Omega^{k + 1}B - Z - C\right)\right] + \frac{\rho}{2}\left\| A\Omega^{k + 1}B - Z - C \right\|_{F}^{2} \right\} \tag{A.18}\notag \end{equation}\]
Start by taking the subgradient with respect to \(Z\):
\[\begin{equation} \begin{split} \partial&\left\{ \lambda\left\| Z \right\|_{1} + tr\left[(\Lambda^{k})'\left(A\Omega^{k + 1}B - Z - C\right)\right] + \frac{\rho}{2}\left\| A\Omega^{k + 1}B - Z - C \right\|_{F}^{2} \right\} \\ &= \partial\left\{ \lambda\left\| Z \right\|_{1} \right\} + \nabla_{\Omega}\left\{ tr\left[(\Lambda^{k})'\left(A\Omega^{k + 1}B - Z - C\right)\right] + \frac{\rho}{2}\left\| A\Omega^{k + 1}B - Z - C \right\|_{F}^{2} \right\} \\ &= \mbox{sign}(Z)\lambda - \Lambda^{k} - \rho\left( A\Omega^{k + 1}B - Z - C \right) \end{split} \tag{A.19}\notag \end{equation}\]
where \(\mbox{sign(Z)}\) is the elementwise sign operator. By setting the gradient/sub-differential equal to zero, we arrive at the following equivalence:
\[\begin{equation} Z_{ij}^{k + 1} = \frac{1}{\rho}\left( \rho\left(A\Omega_{ij}^{k + 1}B - C\right) + \Lambda_{ij}^{k} - Sign\left(Z_{ij}^{k + 1}\right)\lambda \right) \tag{A.20}\notag \end{equation}\]
for all \(i = 1,..., p\) and \(j = 1,..., p\). We observe two scenarios:
- If \(Z_{ij}^{k + 1} > 0\) then
\[\begin{equation} \rho\left(A\Omega_{ij}^{k + 1}B - C\right) + \Lambda_{ij}^{k} > \lambda\alpha \notag \end{equation}\]
- If \(Z_{ij}^{k + 1} < 0\) then
\[\begin{equation} \rho\left(A\Omega_{ij}^{k + 1}B - C\right) + \Lambda_{ij}^{k} < -\lambda\alpha \notag \end{equation}\]
This implies that \(\mbox{sign}\left(Z_{ij}^{k + 1}\right) = \mbox{sign}\left(\rho\left(A\Omega_{ij}^{k + 1}B - C\right) + \Lambda_{ij}^{k}\right)\). Putting all the pieces together, we arrive at
\[\begin{equation} \begin{split} Z_{ij}^{k + 1} &= \frac{1}{\rho}\mbox{sign}\left(\rho\left(A\Omega_{ij}^{k + 1}B - C\right) + \Lambda_{ij}^{k}\right)\left( \left| \rho\left(A\Omega_{ij}^{k + 1}B - C\right) + \Lambda_{ij}^{k} \right| - \lambda \right)_{+} \\ &= \frac{1}{\rho}\mbox{soft}\left(\rho\left(A\Omega_{ij}^{k + 1}B - C\right) + \Lambda_{ij}^{k}, \lambda\right) \end{split} \tag{A.21}\notag \end{equation}\]
where soft is the soft-thresholding function.
A.0.8 Proof of (3.13)
Here we want to show that \(0 \in \rho\left( B(Z^{k + 1} - Z^{k})'A + A'(Z^{k + 1} - Z^{k})B' \right)/2\) which is a direct result of the fact that \(0 \in \partial f\left(\Omega^{k + 1} \right) + \left(B(\Lambda^{k + 1})'A + A'\Lambda^{k + 1}B' \right)/2\).
\[\begin{equation} \begin{split} 0 &\in \partial \left\{ f\left(\Omega^{k + 1}\right) + tr\left[ \Lambda^{k}\left( A\Omega^{k + 1}B - Z^{k} - C \right) \right] + \rho\left\| A\Omega^{k + 1}B - Z^{k} - C \right\|_{F}^{2}/2 \right\} \\ &= \partial f\left(\Omega^{k + 1} \right) + \left(B(\Lambda^{k})'A + A'\Lambda^{k}B' \right)/2 + \rho\left( BB'\Omega^{k + 1}A'A + A'A\Omega^{k + 1}BB' \right)/2 \\ &- \rho\left( A'(Z^{k} + C)B' + B(Z^{k} + C)'A \right)/2 \\ &= \partial f\left(\Omega^{k + 1} \right) + \left(B(\Lambda^{k})'A + A'\Lambda^{k}B' \right)/2 \\ &+ \rho\left( B(B'\Omega^{k + 1}A' - (Z^{k})' - C')A + A'(A\Omega^{k + 1}B - Z^{k} - C)B' \right)/2 \\ &= \partial f\left(\Omega^{k + 1} \right) + \left( B(\Lambda^{k})'A + A'\Lambda^{k}B' \right)/2 + \rho\left(A'(A\Omega^{k + 1}B - Z^{k + 1} + Z^{k + 1} - Z^{k} - C)B' \right)/2 \\ &+ \rho\left(B(B'\Omega^{k + 1}A' - (Z^{k + 1})' + (Z^{k + 1})' - (Z^{k})' - C')A \right)/2 \\ &= \partial f\left(\Omega^{k + 1} \right) + \left[ B\left((\Lambda^{k})' + \rho(B'\Omega^{k + 1}A' - (Z^{k + 1})' - C') \right)A \right]/2 \\ &+ \left[ A'\left(\Lambda^{k} + \rho(A\Omega^{k + 1}B - Z^{k + 1} - c)B \right)B' \right]/2 + \rho\left(B(Z^{k + 1} - Z^{k})'A + A'(Z^{k + 1} - Z^{k})B' \right)/2 \\ &= \partial f\left(\Omega^{k + 1} \right) + \left(B(\Lambda^{k + 1})'A + A'\Lambda^{k + 1}B' \right)/2 + \rho\left(B(Z^{k + 1} - Z^{k})'A + A'(Z^{k + 1} - Z^{k})B' \right)/2 \\ \Rightarrow 0 &\in \rho\left( B(Z^{k + 1} - Z^{k})'A + A'(Z^{k + 1} - Z^{k})B' \right)/2 \end{split} \tag{A.22}\notag \end{equation}\]
A.0.9 Proof of no closed-form solution
Here we want to show that the following optimization problem cannot be solved in closed-form.
\[\begin{equation} \hat{\Omega} = \arg\min_{\Omega \in \mathbb{S}_{+}^{p}}\left\{ tr\left(S\Omega\right) - \log\left|\Omega\right| + \frac{\lambda}{2}\left\|\mathbb{X}\Omega\Sigma_{xy} - \mathbb{Y}\right\|_{F}^{2} \right\} \tag{A.23}\notag \end{equation}\]
First start by taking the gradient with respect to \(\Omega\):
\[\begin{equation} \begin{split} \nabla_{\Omega}&\left\{ tr\left(S\Omega\right) - \log\left|\Omega\right| + \frac{\lambda}{2}\left\|\mathbb{X}\Omega\Sigma_{xy} - \mathbb{Y}\right\|_{F}^{2} \right\} \\ &= \nabla_{\Omega}\left\{ tr\left(S\Omega\right) - \log\left|\Omega\right| + \frac{\lambda}{2}tr\left[\left(\mathbb{X}\Omega\Sigma_{xy} - \mathbb{Y}\right)'\left(\mathbb{X}\Omega\Sigma_{xy} - \mathbb{Y}\right) \right] \right\} \\ &= \nabla_{\Omega}\left\{ tr\left(S\Omega\right) - \log\left|\Omega\right| + \frac{\lambda}{2}tr\left(\Omega\mathbb{X}'\mathbb{X}\Omega\Sigma_{xy}\Sigma_{xy}' - 2\Omega\Sigma_{xy}\mathbb{Y}'\mathbb{X} \right) \right\} \\ &= 2S - S\circ I_{p} - 2\Omega^{-1} + \Omega^{-1}\circ I_{p} + \lambda\mathbb{X}'\mathbb{X}\Omega\Sigma_{xy}\Sigma_{xy}' + \lambda\Sigma_{xy}\Sigma_{xy}'\Omega\mathbb{X}'\mathbb{X} \\ &- \lambda\mathbb{X}'\mathbb{X}\Omega\Sigma_{xy}\Sigma_{xy}'\circ I_{p} - \Sigma_{xy}\mathbb{Y}'\mathbb{X} - \mathbb{X}'\mathbb{Y}\Sigma_{xy} + \Sigma_{xy}\mathbb{Y}'\mathbb{X}\circ I_{P} \\ \Rightarrow 0 &= S - \hat{\Omega}^{-1} + \frac{\lambda}{2}\left(\mathbb{X}'\mathbb{X}\hat{\Omega}\Sigma_{xy}\Sigma_{xy}' + \Sigma_{xy}\Sigma_{xy}'\hat{\Omega}\mathbb{X}'\mathbb{X} \right) - \lambda\Sigma_{xy}\mathbb{Y}'\mathbb{X} \end{split} \tag{A.24}\notag \end{equation}\]
Note that we cannot isolate \(\hat{\Omega}\).
References
Price, Bradley S, Charles J Geyer, and Adam J Rothman. 2015. “Ridge Fusion in Statistical Learning.” Journal of Computational and Graphical Statistics 24 (2). Taylor & Francis: 439–54.
Witten, Daniela M, and Robert Tibshirani. 2009. “Covariance-Regularized Regression and Classification for High Dimensional Problems.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 71 (3). Wiley Online Library: 615–36.