Identify opportunities to improve patient outcomes by exploring
changes in predicted outcomes over changes to input variables. Note
that causality cannot be established by this function. Omitted variable
bias and other statistical phenomena may mean that the impacts predicted
here are not realizable. Clinical guidance is essential in choosing
new_values
and acting on impact predictions. Extensive options are
provided to control what impact predictions are surfaced, including
variable_direction
and prohibited_transitions
.
pip( model, d, new_values, n = 3, allow_same = FALSE, repeated_factors = FALSE, smaller_better = TRUE, variable_direction = NULL, prohibited_transitions = NULL, id )
model | A model_list object, as from |
---|---|
d | A data frame on which |
new_values | A list of alternative values for variables of interest. The
names of the list must be variables in |
n | Integer, default = 3. The maximum number of alternatives to return
for each patient. Note that the actual number returned may be less than
|
allow_same | Logical, default = FALSE. If TRUE, |
repeated_factors | Logical, default = FALSE. Do you want multiple modifications of the same variable for the same patient? |
smaller_better | Logical, default = TRUE. Are lesser values of the
outcome variable in |
variable_direction | Named numeric vector or list with entries of -1 or
1. This specifies the direction numeric variables are permitted to move to
produce improvements. Names of the vector are names of variables in
|
prohibited_transitions | A list of data frames that contain variable
modifications that won't be considered by |
id | Optional. A unquoted variable name in |
A tibble with any id columns and "variable": the name of the variable
being altered, "original_value": the patient's observed value of
"variable", "modified_value": the altered value of "variable",
"original_prediction": the patient's original prediction,
"modified_prediction": the patient's prediction given the that "variable"
changes to "modified_value", "improvement": the difference between the
original and modified prediction with positive values reflecting
improvement based on the value of smaller_better
, and "impact_rank":
the rank of the modification for that patient.
# First, we need a model to make recommendations set.seed(52760) m <- machine_learn(pima_diabetes, patient_id, outcome = diabetes, tune = FALSE, models = "xgb")#>#>#> #>#> #>#>#>#> #>#># Let's look at changes in predicted outcomes for three patients changing their # weight class, blood glucose, and blood pressure modifications <- list(weight_class = c("underweight", "normal", "overweight"), plasma_glucose = c(75, 100), diastolic_bp = 70) pip(model = m, d = pima_diabetes[1:3, ], new_values = modifications)#> # A tibble: 7 x 8 #> patient_id variable original_value modified_value original_predic… #> <int> <chr> <chr> <chr> <dbl> #> 1 1 plasma_… 148 75 0.944 #> 2 1 weight_… obese underweight 0.944 #> 3 1 diastol… 72 70 0.944 #> 4 2 weight_… overweight normal 0.0718 #> 5 2 diastol… 66 70 0.0718 #> 6 2 plasma_… 85 75 0.0718 #> 7 3 plasma_… 183 75 0.892 #> # … with 3 more variables: modified_prediction <dbl>, improvement <dbl>, #> # impact_rank <int># In the above example, only the first patient has a positive predicted impact # from changing their diastolic_bp, so for the other patients fewer than the # default n=3 predictions are provided. We can get n=3 predictions for each # patient by specifying allow_same, which will recommend the other two patients # maintain their current diastolic_bp. pip(model = m, d = pima_diabetes[1:3, ], new_values = modifications, allow_same = TRUE)#> # A tibble: 9 x 8 #> patient_id variable original_value modified_value original_predic… #> <int> <chr> <chr> <chr> <dbl> #> 1 1 plasma_… 148 75 0.944 #> 2 1 weight_… obese underweight 0.944 #> 3 1 diastol… 72 70 0.944 #> 4 2 diastol… 66 70 0.0718 #> 5 2 plasma_… 85 75 0.0718 #> 6 2 weight_… overweight overweight 0.0718 #> 7 3 plasma_… 183 75 0.892 #> 8 3 weight_… normal normal 0.892 #> 9 3 diastol… 64 64 0.892 #> # … with 3 more variables: modified_prediction <dbl>, improvement <dbl>, #> # impact_rank <int># Sometimes clinical knowledge trumps machine learning. In particular, machine # learning models don't establish causality, they only leverage correlation. # Patient impact predictor suggests causality, so clinicians should always be # consulted to ensure that the causal impacts are medically sound. # # If there is clinical knowledge to suggest what impact a variable should have, # that knowledge can be provided to pip. The way it is provided depends on # whether the variable is categorical (prohibited_transitions) or numeric # (variable_direction). ### Constraining categorical variables ### # Suppose a clinician says that recommending a patient change their weight class # to underweight from any value except normal is a bad idea. We can disallow # those suggestions using prohibited_transitions. Note the change in patient # 1's second recommendation goes from underweight to normal. prohibit <- data.frame(from = setdiff(unique(pima_diabetes$weight_class), "normal"), to = "underweight") pip(model = m, d = pima_diabetes[1:3, ], new_values = modifications, prohibited_transitions = list(weight_class = prohibit))#> # A tibble: 7 x 8 #> patient_id variable original_value modified_value original_predic… #> <int> <chr> <chr> <chr> <dbl> #> 1 1 plasma_… 148 75 0.944 #> 2 1 weight_… obese normal 0.944 #> 3 1 diastol… 72 70 0.944 #> 4 2 weight_… overweight normal 0.0718 #> 5 2 diastol… 66 70 0.0718 #> 6 2 plasma_… 85 75 0.0718 #> 7 3 plasma_… 183 75 0.892 #> # … with 3 more variables: modified_prediction <dbl>, improvement <dbl>, #> # impact_rank <int>### Constraining numeric variables ### # Suppose a clinician says that increasing diastolic_bp should never be # recommended to improve diabetes outcomes, and likewise for reducing # plasma_glucose (which is clinically silly, but provides an illustration). The # following code ensures that diastolic_bp is only recommended to decrease and # plasma_glucose is only recommended to increase. Note that the plasma_glucose # recommendations disappear, because no patient would see their outcomes # improve by increasing their plasma_glucose. directional_changes <- c(diastolic_bp = -1, plasma_glucose = 1) pip(model = m, d = pima_diabetes[1:3, ], new_values = modifications, variable_direction = directional_changes)#> # A tibble: 3 x 8 #> patient_id variable original_value modified_value original_predic… #> <int> <chr> <chr> <chr> <dbl> #> 1 1 weight_… obese underweight 0.944 #> 2 1 diastol… 72 70 0.944 #> 3 2 weight_… overweight normal 0.0718 #> # … with 3 more variables: modified_prediction <dbl>, improvement <dbl>, #> # impact_rank <int>