Shrinking characteristics of precision matrix estimators. Penalized precision matrix estimation using the ADMM algorithm. Consider the case where \(X_{1}, ..., X_{n}\) are iid \(N_{p}(\mu, \Sigma)\) and we are tasked with estimating the precision matrix, denoted \(\Omega \equiv \Sigma^{-1}\). This function solves the following optimization problem:

Objective:

\(\hat{\Omega}_{\lambda} = \arg\min_{\Omega \in S_{+}^{p}} \left\{ Tr\left(S\Omega\right) - \log \det\left(\Omega \right) + \lambda\left\| A \Omega B - C \right\|_{1} \right\}\)

where \(\lambda > 0\) and we define \(\left\|A \right\|_{1} = \sum_{i, j} \left| A_{ij} \right|\).

shrink(X = NULL, Y = NULL, S = NULL, A = diag(ncol(S)),
  B = diag(ncol(S)), C = matrix(0, ncol = ncol(B), nrow = ncol(A)),
  nlam = 10, lam.max = NULL, lam.min.ratio = 0.001, lam = NULL,
  alpha = 1, path = FALSE, rho = 2, mu = 10, tau.rho = 2,
  iter.rho = 10, crit = c("ADMM", "loglik"), tol.abs = 1e-04,
  tol.rel = 1e-04, maxit = 10000, adjmaxit = NULL, K = 5,
  crit.cv = c("MSE", "loglik", "penloglik", "AIC", "BIC"), start = c("warm",
  "cold"), cores = 1, trace = c("progress", "print", "none"))

Arguments

X

option to provide a nxp data matrix. Each row corresponds to a single observation and each column contains n observations of a single feature/variable.

Y

option to provide nxr response matrix. Each row corresponds to a single response and each column contains n response of a single feature/response.

S

option to provide a pxp sample covariance matrix (denominator n). If argument is NULL and X is provided instead then S will be computed automatically.

A

option to provide user-specified matrix for penalty term. This matrix must have p columns. Defaults to identity matrix.

B

option to provide user-specified matrix for penalty term. This matrix must have p rows. Defaults to identity matrix.

C

option to provide user-specified matrix for penalty term. This matrix must have nrow(A) rows and ncol(B) columns. Defaults to zero matrix.

nlam

number of lam tuning parameters for penalty term generated from lam.min.ratio and lam.max (automatically generated). Defaults to 10.

lam.max

option to specify the maximum lam tuning parameter. Defaults to NULL.

lam.min.ratio

smallest lam value provided as a fraction of lam.max. The function will automatically generate nlam tuning parameters from lam.min.ratio*lam.max to lam.max in log10 scale. If lam.max = NULL, lam.max is calculated to be the smallest lam such that all off-diagonal entries in Omega are equal to zero. Defaults to 1e-3.

lam

option to provide positive tuning parameters for penalty term. This will cause nlam and lam.min.ratio to be disregarded. If a vector of parameters is provided, they should be in increasing order. Defaults to NULL.

alpha

elastic net mixing parameter contained in [0, 1]. 0 = ridge, 1 = lasso. Alpha must be a single value (cross validation across alpha not supported).

path

option to return the regularization path. This option should be used with extreme care if the dimension is large. If set to TRUE, cores must be set to 1 and errors and optimal tuning parameters will based on the full sample. Defaults to FALSE.

rho

initial step size for ADMM algorithm.

mu

factor for primal and residual norms in the ADMM algorithm. This will be used to adjust the step size rho after each iteration.

tau.rho

factor in which to increase/decrease step size rho

iter.rho

step size rho will be updated every iter.rho steps

crit

criterion for convergence (ADMM or loglik). If crit = loglik then iterations will stop when the relative change in log-likelihood is less than tol.abs. Default is ADMM and follows the procedure outlined in Boyd, et al.

tol.abs

absolute convergence tolerance. Defaults to 1e-4.

tol.rel

relative convergence tolerance. Defaults to 1e-4.

maxit

maximum number of iterations. Defaults to 1e4.

adjmaxit

adjusted maximum number of iterations. During cross validation this option allows the user to adjust the maximum number of iterations after the first lam tuning parameter has converged (for each alpha). This option is intended to be paired with warm starts and allows for 'one-step' estimators. Defaults to NULL.

K

specify the number of folds for cross validation.

crit.cv

cross validation criterion (MSE, loglik, penloglik, AIC, or BIC). Defaults to MSE.

start

specify warm or cold start for cross validation. Default is warm.

cores

option to run CV in parallel. Defaults to cores = 1.

trace

option to display progress of CV. Choose one of progress to print a progress bar, print to print completed tuning parameters, or none.

Value

returns class object ADMMsigma which includes:

Call

function call.

Iterations

number of iterations.

Tuning

optimal tuning parameter.

Lambdas

grid of lambda values for CV.

maxit

maximum number of iterations.

Omega

estimated penalized precision matrix.

Sigma

estimated covariance matrix from the penalized precision matrix (inverse of Omega).

Path

array containing the solution path. Solutions will be ordered in ascending alpha values for each lambda.

Z

final sparse update of estimated penalized precision matrix.

Y

final dual update.

rho

final step size.

Loglik

penalized log-likelihood for Omega

MIN.error

minimum average cross validation error (cv.crit) for optimal parameters.

AVG.error

average cross validation error (cv.crit) across all folds.

CV.error

cross validation errors (cv.crit).

Details

For details on the implementation of 'shrink', see the vignette https://mgallow.github.io/SCPME/.

References

  • Boyd, Stephen, Neal Parikh, Eric Chu, Borja Peleato, Jonathan Eckstein, and others. 2011. 'Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers.' Foundations and Trends in Machine Learning 3 (1). Now Publishers, Inc.: 1-122. https://web.stanford.edu/~boyd/papers/pdf/admm_distr_stats.pdf

  • Hu, Yue, Chi, Eric C, amd Allen, Genevera I. 2016. 'ADMM Algorithmic Regularization Paths for Sparse Statistical Machine Learning.' Splitting Methods in Communication, Imaging, Science, and Engineering. Springer: 433-459.

  • Molstad, Aaron J., and Adam J. Rothman. (2017). 'Shrinking Characteristics of Precision Matrix Estimators. Biometrika.. https://doi.org/10.1093/biomet/asy023

  • Rothman, Adam. 2017. 'STAT 8931 notes on an algorithm to compute the Lasso-penalized Gaussian likelihood precision matrix estimator.'

See also

Examples

# generate some data data = data_gen(n = 100, p = 5, r = 1) # lasso penalized omega (glasso) shrink(X = data$X, Y = data$Y)
#> #> Call: shrink(X = data$X, Y = data$Y) #> #> Iterations: 41 #> #> Tuning parameters: #> log10(lam) lam #> [1,] -1.073 0.085 #> #> Log-likelihood: -179.30164 #> #> Omega: #> [,1] [,2] [,3] [,4] [,5] #> [1,] 1.31952 -0.84100 -0.07959 -0.02455 -0.05374 #> [2,] -0.84100 2.03956 -0.77868 0.00007 -0.00758 #> [3,] -0.07959 -0.77868 2.08500 -0.91900 0.00003 #> [4,] -0.02455 0.00007 -0.91900 1.81751 -0.72308 #> [5,] -0.05374 -0.00758 0.00003 -0.72308 1.42184
# lasso penalized beta (print estimated omega) lam.max = max(abs(t(data$X) %*% data$Y)) (shrink = shrink(X = data$X, Y = data$Y, B = cov(data$X, data$Y), lam.max = lam.max))
#> #> Call: shrink(X = data$X, Y = data$Y, B = cov(data$X, data$Y), lam.max = lam.max) #> #> Iterations: 42 #> #> Tuning parameters: #> log10(lam) lam #> [1,] 0.144 1.394 #> #> Log-likelihood: -112.43678 #> #> Omega: #> [,1] [,2] [,3] [,4] [,5] #> [1,] 1.93642 -1.54912 0.06942 -0.05542 -0.09868 #> [2,] -1.54912 3.24092 -2.01577 0.65981 -0.08105 #> [3,] 0.06942 -2.01577 3.47859 -1.85319 0.29787 #> [4,] -0.05542 0.65981 -1.85319 2.97796 -1.40587 #> [5,] -0.09868 -0.08105 0.29787 -1.40587 2.01199
# print estimated beta shrink$Z
#> [,1] #> [1,] 0.0000000 #> [2,] -0.2285735 #> [3,] 0.0000000 #> [4,] 0.0000000 #> [5,] 0.0000000