Identify opportunities to improve patient outcomes by exploring changes in predicted outcomes over changes to input variables. Note that causality cannot be established by this function. Omitted variable bias and other statistical phenomena may mean that the impacts predicted here are not realizable. Clinical guidance is essential in choosing new_values and acting on impact predictions. Extensive options are provided to control what impact predictions are surfaced, including variable_direction and prohibited_transitions.

pip(
  model,
  d,
  new_values,
  n = 3,
  allow_same = FALSE,
  repeated_factors = FALSE,
  smaller_better = TRUE,
  variable_direction = NULL,
  prohibited_transitions = NULL,
  id
)

Arguments

model

A model_list object, as from machine_learn or tune_models

d

A data frame on which model can make predictions

new_values

A list of alternative values for variables of interest. The names of the list must be variables in d and the entries are the alternative values to try.

n

Integer, default = 3. The maximum number of alternatives to return for each patient. Note that the actual number returned may be less than n, for example if length(new_values) < n or if allow_same is FALSE.

allow_same

Logical, default = FALSE. If TRUE, pip may return rows with modified_value = original_value and improvement = 0. This happens when there are fewer than n modifications for a patient that result in improvement. If allow_same is TRUE and length(new_values) >= n you are likely to get n results for each patient; however, contraints from variable_direction or prohibited_transitions could make recommendations for some variables impossible, resulting in fewer than n recommendations.

repeated_factors

Logical, default = FALSE. Do you want multiple modifications of the same variable for the same patient?

smaller_better

Logical, default = TRUE. Are lesser values of the outcome variable in model preferable?

variable_direction

Named numeric vector or list with entries of -1 or 1. This specifies the direction numeric variables are permitted to move to produce improvements. Names of the vector are names of variables in d; entries are 1 to indicate only increases can yield improvements or -1 to indicate only decreases can yield improvements. Numeric variables not appearing in this list may increase or decrease to surface improvements.

prohibited_transitions

A list of data frames that contain variable modifications that won't be considered by pip. Names of the list are names of variables in d, and data frames have two columns, "from" and "to", indicating the original value and modified value, respectively, of the prohibited transition. If column names are not "from" and "to", the first column will be assumed to be the "from" column. This is intended for categorical variables, but could be used for integers as well.

id

Optional. A unquoted variable name in d representing an identifier column; it will be included in the returned data frame. If not provided, an ID column from model's data prep will be used if available.

Value

A tibble with any id columns and "variable": the name of the variable being altered, "original_value": the patient's observed value of "variable", "modified_value": the altered value of "variable", "original_prediction": the patient's original prediction, "modified_prediction": the patient's prediction given the that "variable" changes to "modified_value", "improvement": the difference between the original and modified prediction with positive values reflecting improvement based on the value of smaller_better, and "impact_rank": the rank of the modification for that patient.

Examples

# First, we need a model to make recommendations set.seed(52760) m <- machine_learn(pima_diabetes, patient_id, outcome = diabetes, tune = FALSE, models = "xgb")
#> Training new data prep recipe...
#> Variable(s) ignored in prep_data won't be used to tune models: patient_id
#> #> diabetes looks categorical, so training classification algorithms.
#> #> After data processing, models are being trained on 12 features with 768 observations. #> Based on n_folds = 5 and hyperparameter settings, the following number of models will be trained: 5 xgb's
#> Training at fixed values: eXtreme Gradient Boosting
#> #> *** Models successfully trained. The model object contains the training data minus ignored ID columns. *** #> *** If there was PHI in training data, normal PHI protocols apply to the model object. ***
# Let's look at changes in predicted outcomes for three patients changing their # weight class, blood glucose, and blood pressure modifications <- list(weight_class = c("underweight", "normal", "overweight"), plasma_glucose = c(75, 100), diastolic_bp = 70) pip(model = m, d = pima_diabetes[1:3, ], new_values = modifications)
#> # A tibble: 7 x 8 #> patient_id variable original_value modified_value original_predic… #> <int> <chr> <chr> <chr> <dbl> #> 1 1 plasma_… 148 75 0.944 #> 2 1 weight_… obese underweight 0.944 #> 3 1 diastol… 72 70 0.944 #> 4 2 weight_… overweight normal 0.0718 #> 5 2 diastol… 66 70 0.0718 #> 6 2 plasma_… 85 75 0.0718 #> 7 3 plasma_… 183 75 0.892 #> # … with 3 more variables: modified_prediction <dbl>, improvement <dbl>, #> # impact_rank <int>
# In the above example, only the first patient has a positive predicted impact # from changing their diastolic_bp, so for the other patients fewer than the # default n=3 predictions are provided. We can get n=3 predictions for each # patient by specifying allow_same, which will recommend the other two patients # maintain their current diastolic_bp. pip(model = m, d = pima_diabetes[1:3, ], new_values = modifications, allow_same = TRUE)
#> # A tibble: 9 x 8 #> patient_id variable original_value modified_value original_predic… #> <int> <chr> <chr> <chr> <dbl> #> 1 1 plasma_… 148 75 0.944 #> 2 1 weight_… obese underweight 0.944 #> 3 1 diastol… 72 70 0.944 #> 4 2 diastol… 66 70 0.0718 #> 5 2 plasma_… 85 75 0.0718 #> 6 2 weight_… overweight overweight 0.0718 #> 7 3 plasma_… 183 75 0.892 #> 8 3 weight_… normal normal 0.892 #> 9 3 diastol… 64 64 0.892 #> # … with 3 more variables: modified_prediction <dbl>, improvement <dbl>, #> # impact_rank <int>
# Sometimes clinical knowledge trumps machine learning. In particular, machine # learning models don't establish causality, they only leverage correlation. # Patient impact predictor suggests causality, so clinicians should always be # consulted to ensure that the causal impacts are medically sound. # # If there is clinical knowledge to suggest what impact a variable should have, # that knowledge can be provided to pip. The way it is provided depends on # whether the variable is categorical (prohibited_transitions) or numeric # (variable_direction). ### Constraining categorical variables ### # Suppose a clinician says that recommending a patient change their weight class # to underweight from any value except normal is a bad idea. We can disallow # those suggestions using prohibited_transitions. Note the change in patient # 1's second recommendation goes from underweight to normal. prohibit <- data.frame(from = setdiff(unique(pima_diabetes$weight_class), "normal"), to = "underweight") pip(model = m, d = pima_diabetes[1:3, ], new_values = modifications, prohibited_transitions = list(weight_class = prohibit))
#> # A tibble: 7 x 8 #> patient_id variable original_value modified_value original_predic… #> <int> <chr> <chr> <chr> <dbl> #> 1 1 plasma_… 148 75 0.944 #> 2 1 weight_… obese normal 0.944 #> 3 1 diastol… 72 70 0.944 #> 4 2 weight_… overweight normal 0.0718 #> 5 2 diastol… 66 70 0.0718 #> 6 2 plasma_… 85 75 0.0718 #> 7 3 plasma_… 183 75 0.892 #> # … with 3 more variables: modified_prediction <dbl>, improvement <dbl>, #> # impact_rank <int>
### Constraining numeric variables ### # Suppose a clinician says that increasing diastolic_bp should never be # recommended to improve diabetes outcomes, and likewise for reducing # plasma_glucose (which is clinically silly, but provides an illustration). The # following code ensures that diastolic_bp is only recommended to decrease and # plasma_glucose is only recommended to increase. Note that the plasma_glucose # recommendations disappear, because no patient would see their outcomes # improve by increasing their plasma_glucose. directional_changes <- c(diastolic_bp = -1, plasma_glucose = 1) pip(model = m, d = pima_diabetes[1:3, ], new_values = modifications, variable_direction = directional_changes)
#> # A tibble: 3 x 8 #> patient_id variable original_value modified_value original_predic… #> <int> <chr> <chr> <chr> <dbl> #> 1 1 weight_… obese underweight 0.944 #> 2 1 diastol… 72 70 0.944 #> 3 2 weight_… overweight normal 0.0718 #> # … with 3 more variables: modified_prediction <dbl>, improvement <dbl>, #> # impact_rank <int>