Plot model predictions vs observed outcomes

# S3 method for predicted_df
plot(x, caption = TRUE, title = NULL,
  font_size = 11, outcomes = NULL, fixed_aspect = attr(x,
  "model_info")$type == "Regression", print = TRUE, ...)

plot_regression_predictions(x, point_size = 1, point_alpha = 1, target)

plot_classification_predictions(x, fill_colors = c("firebrick", "steelblue"),
  fill_alpha = 0.7, curve_flex = 1, add_labels = TRUE, target)

plot_multiclass_predictions(x, conf_colors = c("black", "steelblue"),
  text_color = "yellow", diag_color = "red", target)

Arguments

x

data frame as returned `predict.model_list`

caption

Put model performance in plot caption? TRUE (default) prints all available metrics, FALSE prints nothing. Can also provide metric name (e.g. "RMSE"), in which case the caption will include only that metric.

title

Character: Plot title, default NULL produces no title.

font_size

Number: Relative size of all font in plot, default = 11

outcomes

Vector of outcomes if not present in x

fixed_aspect

Logical: If TRUE (default for regression only), units of the x- and y-axis will have the same spacing.

print

Logical, if TRUE (default) the plot is printed on the current graphics device. The plot is always (silently) returned.

...

Parameters specific to plot_regression_predictions or plot_classification_predictions; listed below. These must be named.

point_size

Number: Point size, relative to 1

point_alpha

Number in [0, 1] giving point opacity

target

Not meant to be set by user. outcome column name

fill_colors

Length-2 character vector: colors to fill density curves. Default is c("firebrick", "steelblue"). If named, names must match unique(x[[target]]), in any order.

fill_alpha

Number in [0, 1] giving opacity of fill colors.

curve_flex

Numeric. Kernal adjustment for density curves. Default is 1. Less than 1 makes curves more flexible, analogous to smaller bins in a histogram; greater than 1 makes curves more rigid.

add_labels

If TRUE (default) and a predicted_group column was added to predictions by specifying risk_groups or outcome_groups in link{predict.model_list}, labels specifying groups are added to the plot.

conf_colors

Length-2 character vector: colors to fill density curves. Default is c("black", "steelblue").

text_color

Character: color to write percent correct. Default is "yellow".

diag_color

Character: color to highlight main diagonal. These are correct predictions. Default is "red".

Value

A ggplot object

Details

Note that a ggplot object is returned, so you can do additional customization of the plot. See the third example.

Examples

# Some regression examples models <- machine_learn(pima_diabetes[1:50, ], patient_id, outcome = plasma_glucose, models = "rf", tune = FALSE)
#> Training new data prep recipe...
#> Variable(s) ignored in prep_data won't be used to tune models: patient_id
#> #> plasma_glucose looks numeric, so training regression algorithms.
#> #> After data processing, models are being trained on 14 features with 50 observations. #> Based on n_folds = 5 and hyperparameter settings, the following number of models will be trained: 5 rf's
#> Training at fixed values: Random Forest
#> #> *** Models successfully trained. The model object contains the training data minus ignored ID columns. *** #> *** If there was PHI in training data, normal PHI protocols apply to the model object. ***
predictions <- predict(models) plot(predictions)
plot(predictions, caption = "Rsquared", title = "This model's predictions regress to the mean", point_size = 3, point_alpha = .7, font_size = 14)
p <- plot(predictions, print = FALSE) p + theme_classic()
# A classification example with risk groups class_models <- machine_learn(pima_diabetes, patient_id, outcome = diabetes, models = "xgb", tune = FALSE)
#> Training new data prep recipe...
#> Variable(s) ignored in prep_data won't be used to tune models: patient_id
#> #> diabetes looks categorical, so training classification algorithms.
#> #> After data processing, models are being trained on 12 features with 768 observations. #> Based on n_folds = 5 and hyperparameter settings, the following number of models will be trained: 5 xgb's
#> Training at fixed values: eXtreme Gradient Boosting
#> #> *** Models successfully trained. The model object contains the training data minus ignored ID columns. *** #> *** If there was PHI in training data, normal PHI protocols apply to the model object. ***
predict(class_models, risk_groups = c("v low", "low", "medium", "high", "very high")) %>% plot()