`hcai-impute` adds various imputation methods to an existing recipe. Currently supports mean (numeric only), new_category (categorical only), bagged trees, or knn.

hcai_impute(
  recipe,
  nominal_method = "new_category",
  numeric_method = "mean",
  numeric_params = NULL,
  nominal_params = NULL
)

Arguments

recipe

A recipe object. imputation will be added to the sequence of operations for this recipe.

nominal_method

Defaults to "new_category". Other choices are "bagimpute", "knnimpute" or "locfimpute".

numeric_method

Defaults to "mean". Other choices are "bagimpute", "knnimpute" or "locfimpute".

numeric_params

A named list with parmeters to use with chosen imputation method on numeric data. Options are bag_model (bagimpute only), bag_trees (bagimpute only), bag_options (bagimpute only), bag_trees (bagimpute only), knn_K (knnimpute only), impute_with (knnimpute only), (bag or knn) or seed_val (bag or knn). See step_bagimpute or step_knnimpute for details.

nominal_params

A named list with parmeters to use with chosen imputation method on nominal data. Options are bag_model (bagimpute only), bag_trees (bagimpute only), bag_options (bagimpute only), bag_trees (bagimpute only), knn_K (knnimpute only), impute_with (knnimpute only), (bag or knn) or seed_val (bag or knn). See step_bagimpute or step_knnimpute for details.

Value

An updated version of `recipe` with the new step added to the sequence of existing steps.

Examples

library(recipes)
#> #> Attaching package: ‘recipes’
#> The following object is masked from ‘package:stats’: #> #> step
n = 100 set.seed(9) d <- tibble::tibble(patient_id = 1:n, age = sample(c(30:80, NA), size = n, replace = TRUE), hemoglobin_count = rnorm(n, mean = 15, sd = 1), hemoglobin_category = sample(c("Low", "Normal", "High", NA), size = n, replace = TRUE), disease = ifelse(hemoglobin_count < 15, "Yes", "No") ) # Initialize my_recipe <- recipe(disease ~ ., data = d) # Create recipe my_recipe <- my_recipe %>% hcai_impute() my_recipe
#> Data Recipe #> #> Inputs: #> #> role #variables #> outcome 1 #> predictor 4 #> #> Operations: #> #> Mean Imputation for all_numeric(), -all_outcomes() #> Filling NA with missing for all_nominal(), -all_outcomes()
# Train recipe trained_recipe <- prep(my_recipe, training = d) # Apply recipe data_modified <- bake(trained_recipe, new_data = d) missingness(data_modified)
#> # A tibble: 5 x 2 #> variable percent_missing #> * <chr> <dbl> #> 1 patient_id 0 #> 2 age 0 #> 3 hemoglobin_count 0 #> 4 hemoglobin_category 0 #> 5 disease 0
# Specify methods: my_recipe <- my_recipe %>% hcai_impute(numeric_method = "bagimpute", nominal_method = "locfimpute") my_recipe
#> Data Recipe #> #> Inputs: #> #> role #variables #> outcome 1 #> predictor 4 #> #> Operations: #> #> Mean Imputation for all_numeric(), -all_outcomes() #> Filling NA with missing for all_nominal(), -all_outcomes() #> Bagged tree imputation for all_numeric(), -all_outcomes() #> LOCF Imputation for all_nominal(), -all_outcomes()
# Specify methods and params: my_recipe <- my_recipe %>% hcai_impute(numeric_method = "knnimpute", numeric_params = list(knn_K = 4))
#> `knnimpute` depends on another library that does not support character columns yet. If `knnimpute` fails please convert all character columns to factors for knn imputation.
my_recipe
#> Data Recipe #> #> Inputs: #> #> role #variables #> outcome 1 #> predictor 4 #> #> Operations: #> #> Mean Imputation for all_numeric(), -all_outcomes() #> Filling NA with missing for all_nominal(), -all_outcomes() #> Bagged tree imputation for all_numeric(), -all_outcomes() #> LOCF Imputation for all_nominal(), -all_outcomes() #> K-nearest neighbor imputation for all_numeric(), -all_outcomes() #> Filling NA with missing for all_nominal(), -all_outcomes()