Shrinking characteristics of precision matrix estimators. Penalized precision matrix estimation using the ADMM algorithm. Consider the case where \(X_{1}, ..., X_{n}\) are iid \(N_{p}(\mu, \Sigma)\) and we are tasked with estimating the precision matrix, denoted \(\Omega \equiv \Sigma^{-1}\). This function solves the following optimization problem:
\(\hat{\Omega}_{\lambda} = \arg\min_{\Omega \in S_{+}^{p}} \left\{ Tr\left(S\Omega\right) - \log \det\left(\Omega \right) + \lambda\left\| A \Omega B - C \right\|_{1} \right\}\)
where \(\lambda > 0\) and we define \(\left\|A \right\|_{1} = \sum_{i, j} \left| A_{ij} \right|\).
shrink(X = NULL, Y = NULL, S = NULL, A = diag(ncol(S)), B = diag(ncol(S)), C = matrix(0, ncol = ncol(B), nrow = ncol(A)), nlam = 10, lam.max = NULL, lam.min.ratio = 0.001, lam = NULL, alpha = 1, path = FALSE, rho = 2, mu = 10, tau.rho = 2, iter.rho = 10, crit = c("ADMM", "loglik"), tol.abs = 1e-04, tol.rel = 1e-04, maxit = 10000, adjmaxit = NULL, K = 5, crit.cv = c("MSE", "loglik", "penloglik", "AIC", "BIC"), start = c("warm", "cold"), cores = 1, trace = c("progress", "print", "none"))
X | option to provide a nxp data matrix. Each row corresponds to a single observation and each column contains n observations of a single feature/variable. |
---|---|
Y | option to provide nxr response matrix. Each row corresponds to a single response and each column contains n response of a single feature/response. |
S | option to provide a pxp sample covariance matrix (denominator n). If argument is |
A | option to provide user-specified matrix for penalty term. This matrix must have p columns. Defaults to identity matrix. |
B | option to provide user-specified matrix for penalty term. This matrix must have p rows. Defaults to identity matrix. |
C | option to provide user-specified matrix for penalty term. This matrix must have nrow(A) rows and ncol(B) columns. Defaults to zero matrix. |
nlam | number of |
lam.max | option to specify the maximum |
lam.min.ratio | smallest |
lam | option to provide positive tuning parameters for penalty term. This will cause |
alpha | elastic net mixing parameter contained in [0, 1]. |
path | option to return the regularization path. This option should be used with extreme care if the dimension is large. If set to TRUE, cores must be set to 1 and errors and optimal tuning parameters will based on the full sample. Defaults to FALSE. |
rho | initial step size for ADMM algorithm. |
mu | factor for primal and residual norms in the ADMM algorithm. This will be used to adjust the step size |
tau.rho | factor in which to increase/decrease step size |
iter.rho | step size |
crit | criterion for convergence ( |
tol.abs | absolute convergence tolerance. Defaults to 1e-4. |
tol.rel | relative convergence tolerance. Defaults to 1e-4. |
maxit | maximum number of iterations. Defaults to 1e4. |
adjmaxit | adjusted maximum number of iterations. During cross validation this option allows the user to adjust the maximum number of iterations after the first |
K | specify the number of folds for cross validation. |
crit.cv | cross validation criterion ( |
start | specify |
cores | option to run CV in parallel. Defaults to |
trace | option to display progress of CV. Choose one of |
returns class object ADMMsigma
which includes:
function call.
number of iterations.
optimal tuning parameter.
grid of lambda values for CV.
maximum number of iterations.
estimated penalized precision matrix.
estimated covariance matrix from the penalized precision matrix (inverse of Omega).
array containing the solution path. Solutions will be ordered in ascending alpha values for each lambda.
final sparse update of estimated penalized precision matrix.
final dual update.
final step size.
penalized log-likelihood for Omega
minimum average cross validation error (cv.crit) for optimal parameters.
average cross validation error (cv.crit) across all folds.
cross validation errors (cv.crit).
For details on the implementation of 'shrink', see the vignette https://mgallow.github.io/SCPME/.
Boyd, Stephen, Neal Parikh, Eric Chu, Borja Peleato, Jonathan Eckstein, and others. 2011. 'Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers.' Foundations and Trends in Machine Learning 3 (1). Now Publishers, Inc.: 1-122. https://web.stanford.edu/~boyd/papers/pdf/admm_distr_stats.pdf
Hu, Yue, Chi, Eric C, amd Allen, Genevera I. 2016. 'ADMM Algorithmic Regularization Paths for Sparse Statistical Machine Learning.' Splitting Methods in Communication, Imaging, Science, and Engineering. Springer: 433-459.
Molstad, Aaron J., and Adam J. Rothman. (2017). 'Shrinking Characteristics of Precision Matrix Estimators. Biometrika.. https://doi.org/10.1093/biomet/asy023
Rothman, Adam. 2017. 'STAT 8931 notes on an algorithm to compute the Lasso-penalized Gaussian likelihood precision matrix estimator.'
# generate some data data = data_gen(n = 100, p = 5, r = 1) # lasso penalized omega (glasso) shrink(X = data$X, Y = data$Y)#> #> Call: shrink(X = data$X, Y = data$Y) #> #> Iterations: 41 #> #> Tuning parameters: #> log10(lam) lam #> [1,] -1.073 0.085 #> #> Log-likelihood: -179.30164 #> #> Omega: #> [,1] [,2] [,3] [,4] [,5] #> [1,] 1.31952 -0.84100 -0.07959 -0.02455 -0.05374 #> [2,] -0.84100 2.03956 -0.77868 0.00007 -0.00758 #> [3,] -0.07959 -0.77868 2.08500 -0.91900 0.00003 #> [4,] -0.02455 0.00007 -0.91900 1.81751 -0.72308 #> [5,] -0.05374 -0.00758 0.00003 -0.72308 1.42184# lasso penalized beta (print estimated omega) lam.max = max(abs(t(data$X) %*% data$Y)) (shrink = shrink(X = data$X, Y = data$Y, B = cov(data$X, data$Y), lam.max = lam.max))#> #> Call: shrink(X = data$X, Y = data$Y, B = cov(data$X, data$Y), lam.max = lam.max) #> #> Iterations: 42 #> #> Tuning parameters: #> log10(lam) lam #> [1,] 0.144 1.394 #> #> Log-likelihood: -112.43678 #> #> Omega: #> [,1] [,2] [,3] [,4] [,5] #> [1,] 1.93642 -1.54912 0.06942 -0.05542 -0.09868 #> [2,] -1.54912 3.24092 -2.01577 0.65981 -0.08105 #> [3,] 0.06942 -2.01577 3.47859 -1.85319 0.29787 #> [4,] -0.05542 0.65981 -1.85319 2.97796 -1.40587 #> [5,] -0.09868 -0.08105 0.29787 -1.40587 2.01199# print estimated beta shrink$Z#> [,1] #> [1,] 0.0000000 #> [2,] -0.2285735 #> [3,] 0.0000000 #> [4,] 0.0000000 #> [5,] 0.0000000