Interpret a model via regularized coefficient estimates

interpret(x, sparsity = NULL, remove_zeros = TRUE, top_n)

Arguments

x

a model_list object containing a glmnet model

sparsity

If NULL (default) coefficients for the best-performing model will be returned. Otherwise, a value in [0, 1] that determines the sparseness of the model for which coefficients will be returned, with 0 being maximally sparse (i.e. having the fewest non-zero coefficients) and 1 being minimally sparse

remove_zeros

Remove features with coefficients equal to 0? Default is TRUE

top_n

Integer: How many coefficients to return? The largest top_n absolute-value coefficients will be returned. If missing (default), all coefficients are returned

Value

A data frame of variables and their regularized regression coefficient estimates with parent class "interpret"

Details

**WARNING** Coefficients are on the scale of the predictors; they are not standardized, so unless features were scaled before training (e.g. with prep_data(..., scale = TRUE), the magnitude of coefficients does not necessarily reflect their importance.

If x was trained with more than one value of alpha the best value of alpha is used; sparsity is determined only via the selection of lambda. Using only lasso regression (i.e. alpha = 1) will produce a sparser set of coefficients and can be obtained by not tuning hyperparameters.

See also

Examples

m <- machine_learn(pima_diabetes, patient_id, outcome = diabetes, models = "glm")
#> Training new data prep recipe...
#> Variable(s) ignored in prep_data won't be used to tune models: patient_id
#> #> diabetes looks categorical, so training classification algorithms.
#> #> After data processing, models are being trained on 12 features with 768 observations. #> Based on n_folds = 5 and hyperparameter settings, the following number of models will be trained: 100 glm's
#> Training with cross validation: glmnet
#> #> *** Models successfully trained. The model object contains the training data minus ignored ID columns. *** #> *** If there was PHI in training data, normal PHI protocols apply to the model object. ***
interpret(m)
#> Reference Levels: #> All `weight_class` estimates are relative to `obese` #> #> # A tibble: 9 x 2 #> variable coefficient #> * <chr> <dbl> #> 1 (Intercept) -5.44 #> 2 weight_class_normal (vs. obese) -1.35 #> 3 weight_class_overweight (vs. obese) -0.554 #> 4 pedigree 0.553 #> 5 weight_class_morbidly.obese (vs. obese) 0.118 #> 6 pregnancies 0.0866 #> 7 plasma_glucose 0.0322 #> 8 skinfold 0.00660 #> 9 age 0.00643
interpret(m, .2)
#> Reference Levels: #> All `weight_class` estimates are relative to `obese` #> #> # A tibble: 4 x 2 #> variable coefficient #> * <chr> <dbl> #> 1 (Intercept) -3.75 #> 2 weight_class_normal (vs. obese) -0.208 #> 3 plasma_glucose 0.0247 #> 4 pregnancies 0.0197
interpret(m) %>% plot()