Patient Impact Predictor — pip • healthcareai

Identify opportunities to improve patient outcomes by exploring changes in predicted outcomes over changes to input variables. Note that causality cannot be established by this function. Omitted variable bias and other statistical phenomena may mean that the impacts predicted here are not realizable. Clinical guidance is essential in choosing new_values and acting on impact predictions. Extensive options are provided to control what impact predictions are surfaced, including variable_direction and prohibited_transitions.

pip(
  model,
  d,
  new_values,
  n = 3,
  allow_same = FALSE,
  repeated_factors = FALSE,
  smaller_better = TRUE,
  variable_direction = NULL,
  prohibited_transitions = NULL,
  id
)

Arguments

model	A model_list object, as from `machine_learn` or `tune_models`
d	A data frame on which `model` can make predictions
new_values	A list of alternative values for variables of interest. The names of the list must be variables in `d` and the entries are the alternative values to try.
n	Integer, default = 3. The maximum number of alternatives to return for each patient. Note that the actual number returned may be less than `n`, for example if `length(new_values) < n` or if `allow_same` is FALSE.
allow_same	Logical, default = FALSE. If TRUE, `pip` may return rows with `modified_value = original_value` and `improvement = 0`. This happens when there are fewer than `n` modifications for a patient that result in improvement. If `allow_same` is TRUE and `length(new_values) >= n` you are likely to get `n` results for each patient; however, contraints from `variable_direction` or `prohibited_transitions` could make recommendations for some variables impossible, resulting in fewer than `n` recommendations.
repeated_factors	Logical, default = FALSE. Do you want multiple modifications of the same variable for the same patient?
smaller_better	Logical, default = TRUE. Are lesser values of the outcome variable in `model` preferable?
variable_direction	Named numeric vector or list with entries of -1 or 1. This specifies the direction numeric variables are permitted to move to produce improvements. Names of the vector are names of variables in `d`; entries are 1 to indicate only increases can yield improvements or -1 to indicate only decreases can yield improvements. Numeric variables not appearing in this list may increase or decrease to surface improvements.
prohibited_transitions	A list of data frames that contain variable modifications that won't be considered by `pip`. Names of the list are names of variables in `d`, and data frames have two columns, "from" and "to", indicating the original value and modified value, respectively, of the prohibited transition. If column names are not "from" and "to", the first column will be assumed to be the "from" column. This is intended for categorical variables, but could be used for integers as well.
id	Optional. A unquoted variable name in `d` representing an identifier column; it will be included in the returned data frame. If not provided, an ID column from `model`'s data prep will be used if available.

Value

A tibble with any id columns and "variable": the name of the variable being altered, "original_value": the patient's observed value of "variable", "modified_value": the altered value of "variable", "original_prediction": the patient's original prediction, "modified_prediction": the patient's prediction given the that "variable" changes to "modified_value", "improvement": the difference between the original and modified prediction with positive values reflecting improvement based on the value of smaller_better, and "impact_rank": the rank of the modification for that patient.

Examples

# First, we need a model to make recommendations
set.seed(52760)
m <- machine_learn(pima_diabetes, patient_id, outcome = diabetes,
                   tune = FALSE, models = "xgb")
#> Training new data prep recipe...
#> Variable(s) ignored in prep_data won't be used to tune models: patient_id
#> 
#> diabetes looks categorical, so training classification algorithms.
#> 
#> After data processing, models are being trained on 12 features with 768 observations.
#> Based on n_folds = 5 and hyperparameter settings, the following number of models will be trained: 5 xgb's 
#> Training at fixed values: eXtreme Gradient Boosting
#> 
#> *** Models successfully trained. The model object contains the training data minus ignored ID columns. ***
#> *** If there was PHI in training data, normal PHI protocols apply to the model object. ***
# Let's look at changes in predicted outcomes for three patients changing their
# weight class, blood glucose, and blood pressure
modifications <- list(weight_class = c("underweight", "normal", "overweight"),
                      plasma_glucose = c(75, 100),
                      diastolic_bp = 70)
pip(model = m, d = pima_diabetes[1:3, ], new_values = modifications)
#> # A tibble: 7 x 8
#>   patient_id variable original_value modified_value original_predic…
#>        <int> <chr>    <chr>          <chr>                     <dbl>
#> 1          1 plasma_… 148            75                       0.944 
#> 2          1 weight_… obese          underweight              0.944 
#> 3          1 diastol… 72             70                       0.944 
#> 4          2 weight_… overweight     normal                   0.0718
#> 5          2 diastol… 66             70                       0.0718
#> 6          2 plasma_… 85             75                       0.0718
#> 7          3 plasma_… 183            75                       0.892 
#> # … with 3 more variables: modified_prediction <dbl>, improvement <dbl>,
#> #   impact_rank <int>

# In the above example, only the first patient has a positive predicted impact
# from changing their diastolic_bp, so for the other patients fewer than the
# default n=3 predictions are provided. We can get n=3 predictions for each
# patient by specifying allow_same, which will recommend the other two patients
# maintain their current diastolic_bp.
pip(model = m, d = pima_diabetes[1:3, ], new_values = modifications, allow_same = TRUE)
#> # A tibble: 9 x 8
#>   patient_id variable original_value modified_value original_predic…
#>        <int> <chr>    <chr>          <chr>                     <dbl>
#> 1          1 plasma_… 148            75                       0.944 
#> 2          1 weight_… obese          underweight              0.944 
#> 3          1 diastol… 72             70                       0.944 
#> 4          2 diastol… 66             70                       0.0718
#> 5          2 plasma_… 85             75                       0.0718
#> 6          2 weight_… overweight     overweight               0.0718
#> 7          3 plasma_… 183            75                       0.892 
#> 8          3 weight_… normal         normal                   0.892 
#> 9          3 diastol… 64             64                       0.892 
#> # … with 3 more variables: modified_prediction <dbl>, improvement <dbl>,
#> #   impact_rank <int>

# Sometimes clinical knowledge trumps machine learning. In particular, machine
# learning models don't establish causality, they only leverage correlation.
# Patient impact predictor suggests causality, so clinicians should always be
# consulted to ensure that the causal impacts are medically sound.
#
# If there is clinical knowledge to suggest what impact a variable should have,
# that knowledge can be provided to pip. The way it is provided depends on
# whether the variable is categorical (prohibited_transitions) or numeric
# (variable_direction).

### Constraining categorical variables ###
# Suppose a clinician says that recommending a patient change their weight class
# to underweight from any value except normal is a bad idea. We can disallow
# those suggestions using prohibited_transitions. Note the change in patient
# 1's second recommendation goes from underweight to normal.
prohibit <- data.frame(from = setdiff(unique(pima_diabetes$weight_class), "normal"),
                       to = "underweight")
pip(model = m, d = pima_diabetes[1:3, ], new_values = modifications,
    prohibited_transitions = list(weight_class = prohibit))
#> # A tibble: 7 x 8
#>   patient_id variable original_value modified_value original_predic…
#>        <int> <chr>    <chr>          <chr>                     <dbl>
#> 1          1 plasma_… 148            75                       0.944 
#> 2          1 weight_… obese          normal                   0.944 
#> 3          1 diastol… 72             70                       0.944 
#> 4          2 weight_… overweight     normal                   0.0718
#> 5          2 diastol… 66             70                       0.0718
#> 6          2 plasma_… 85             75                       0.0718
#> 7          3 plasma_… 183            75                       0.892 
#> # … with 3 more variables: modified_prediction <dbl>, improvement <dbl>,
#> #   impact_rank <int>

### Constraining numeric variables ###
# Suppose a clinician says that increasing diastolic_bp should never be
# recommended to improve diabetes outcomes, and likewise for reducing
# plasma_glucose (which is clinically silly, but provides an illustration). The
# following code ensures that diastolic_bp is only recommended to decrease and
# plasma_glucose is only recommended to increase. Note that the plasma_glucose
# recommendations disappear, because no patient would see their outcomes
# improve by increasing their plasma_glucose.
directional_changes <- c(diastolic_bp = -1, plasma_glucose = 1)
pip(model = m, d = pima_diabetes[1:3, ], new_values = modifications,
    variable_direction = directional_changes)
#> # A tibble: 3 x 8
#>   patient_id variable original_value modified_value original_predic…
#>        <int> <chr>    <chr>          <chr>                     <dbl>
#> 1          1 weight_… obese          underweight              0.944 
#> 2          1 diastol… 72             70                       0.944 
#> 3          2 weight_… overweight     normal                   0.0718
#> # … with 3 more variables: modified_prediction <dbl>, improvement <dbl>,
#> #   impact_rank <int>