`hcai-impute` adds various imputation methods to an existing recipe. Currently supports mean (numeric only), new_category (categorical only), bagged trees, or knn.

hcai_impute(recipe, nominal_method = "new_category",
  numeric_method = "mean", numeric_params = NULL, nominal_params = NULL)

Arguments

recipe

A recipe object. imputation will be added to the sequence of operations for this recipe.

nominal_method

Defaults to "new_category". Other choices are "bagimpute" or "knnimpute".

numeric_method

Defaults to "mean". Other choices are "bagimpute" or "knnimpute".

numeric_params

A named list with parmeters to use with chosen imputation method on numeric data. Options are bag_model (bagimpute only), bag_options (bagimpute only), knn_K, (knnimpute only), impute_with, (bag or knn) or seed_val (bag or knn). See step_bagimpute or step_knnimpute for details.

nominal_params

A named list with parmeters to use with chosen imputation method on nominal data. Options are bag_model (bagimpute only), bag_options (bagimpute only), knn_K, (knnimpute only), impute_with, (bag or knn) or seed_val (bag or knn). See step_bagimpute or step_knnimpute for details.

Value

An updated version of `recipe` with the new step added to the sequence of existing steps.

Examples

library(recipes)
#> Loading required package: broom
#> #> Attaching package: ‘recipes’
#> The following object is masked from ‘package:stats’: #> #> step
n = 100 set.seed(9) d <- tibble::tibble(patient_id = 1:n, age = sample(c(30:80, NA), size = n, replace = TRUE), hemoglobin_count = rnorm(n, mean = 15, sd = 1), hemoglobin_category = sample(c("Low", "Normal", "High", NA), size = n, replace = TRUE), disease = ifelse(hemoglobin_count < 15, "Yes", "No") ) # Initialize my_recipe <- recipe(disease ~ ., data = d) # Create recipe my_recipe <- my_recipe %>% hcai_impute() my_recipe
#> Data Recipe #> #> Inputs: #> #> role #variables #> outcome 1 #> predictor 4 #> #> Operations: #> #> Mean Imputation for all_numeric(), -all_outcomes() #> Filling NA with missing for all_nominal(), -all_outcomes()
# Train recipe trained_recipe <- prep(my_recipe, training = d) # Apply recipe data_modified <- bake(trained_recipe, newdata = d) missingness(data_modified)
#> # A tibble: 5 x 2 #> variable percent_missing #> * <chr> <dbl> #> 1 patient_id 0 #> 2 age 0 #> 3 hemoglobin_count 0 #> 4 hemoglobin_category 0 #> 5 disease 0
# Specify methods: my_recipe <- my_recipe %>% hcai_impute(numeric_method = "bagimpute", nominal_method = "new_category") my_recipe
#> Data Recipe #> #> Inputs: #> #> role #variables #> outcome 1 #> predictor 4 #> #> Operations: #> #> Mean Imputation for all_numeric(), -all_outcomes() #> Filling NA with missing for all_nominal(), -all_outcomes() #> Bagged tree imputation for all_numeric(), -all_outcomes() #> Filling NA with missing for all_nominal(), -all_outcomes()
# Specify methods and params: my_recipe <- my_recipe %>% hcai_impute(numeric_method = "knnimpute", numeric_params = list(knn_K = 4)) my_recipe
#> Data Recipe #> #> Inputs: #> #> role #variables #> outcome 1 #> predictor 4 #> #> Operations: #> #> Mean Imputation for all_numeric(), -all_outcomes() #> Filling NA with missing for all_nominal(), -all_outcomes() #> Bagged tree imputation for all_numeric(), -all_outcomes() #> Filling NA with missing for all_nominal(), -all_outcomes() #> 4-nearest neighbor imputation for all_numeric(), -all_outcomes() #> Filling NA with missing for all_nominal(), -all_outcomes()