Interpret a model via regularized coefficient estimates

interpret(x, sparsity = NULL, remove_zeros = TRUE, top_n)

Arguments

x

a model_list object containing a glmnet model

sparsity

If NULL (default) coefficients for the best-performing model will be returned. Otherwise, a value in [0, 1] that determines the sparseness of the model for which coefficients will be returned, with 0 being maximally sparse (i.e. having the fewest non-zero coefficients) and 1 being minimally sparse

remove_zeros

Remove features with coefficients equal to 0? Default is TRUE

top_n

Integer: How many coefficients to return? The largest top_n absolute-value coefficients will be returned. If missing (default), all coefficients are returned

Value

A data frame of variables and their regularized regression coefficient estimates with parent class "interpret"

Details

**WARNING** Coefficients are on the scale of the predictors; they are not standardized, so unless features were scaled before training (e.g. with prep_data(..., scale = TRUE), the magnitude of coefficients does not necessarily reflect their importance.

If x was trained with more than one value of alpha the best value of alpha is used; sparsity is determined only via the selection of lambda. Using only lasso regression (i.e. alpha = 1) will produce a sparser set of coefficients and can be obtained by not tuning hyperparameters.

See also

Examples

m <- machine_learn(pima_diabetes, patient_id, outcome = diabetes, models = "glm")
#> Training new data prep recipe...
#> Variable(s) ignored in prep_data won't be used to tune models: patient_id
#> #> diabetes looks categorical, so training classification algorithms.
#> #> After data processing, models are being trained on 12 features with 768 observations. #> Based on n_folds = 5 and hyperparameter settings, the following number of models will be trained: 100 glm's
#> Training with cross validation: glmnet
#> #> *** Models successfully trained. The model object contains the training data minus ignored ID columns. *** #> *** If there was PHI in training data, normal PHI protocols apply to the model object. ***
interpret(m)
#> # A tibble: 12 x 2 #> variable coefficient #> * <chr> <dbl> #> 1 (Intercept) -5.89 #> 2 weight_class_normal -1.49 #> 3 pedigree 0.785 #> 4 weight_class_other -0.726 #> 5 weight_class_overweight -0.663 #> 6 pregnancies 0.0912 #> 7 weight_class_obese 0.0662 #> 8 plasma_glucose 0.0301 #> 9 skinfold 0.0153 #> 10 age 0.0126 #> 11 diastolic_bp 0.00147 #> 12 insulin 0.000188
interpret(m, .2)
#> # A tibble: 12 x 2 #> variable coefficient #> * <chr> <dbl> #> 1 (Intercept) -0.691 #> 2 weight_class_normal -0.00873 #> 3 pedigree 0.00677 #> 4 weight_class_other -0.00593 #> 5 weight_class_obese 0.00469 #> 6 weight_class_overweight -0.00441 #> 7 pregnancies 0.000847 #> 8 skinfold 0.000314 #> 9 age 0.000260 #> 10 plasma_glucose 0.000209 #> 11 diastolic_bp 0.000176 #> 12 insulin 0.0000324
interpret(m) %>% plot()