`hcai-impute` adds various imputation methods to an existing recipe. Currently supports mean (numeric only), new_category (categorical only), bagged trees, or knn.
hcai_impute( recipe, nominal_method = "new_category", numeric_method = "mean", numeric_params = NULL, nominal_params = NULL )
recipe | A recipe object. imputation will be added to the sequence of operations for this recipe. |
---|---|
nominal_method | Defaults to |
numeric_method | Defaults to |
numeric_params | A named list with parmeters to use with
chosen imputation method on numeric data. Options are
|
nominal_params | A named list with parmeters to use with
chosen imputation method on nominal data. Options are
|
An updated version of `recipe` with the new step added to the sequence of existing steps.
library(recipes)#> #>#>#> #>n = 100 set.seed(9) d <- tibble::tibble(patient_id = 1:n, age = sample(c(30:80, NA), size = n, replace = TRUE), hemoglobin_count = rnorm(n, mean = 15, sd = 1), hemoglobin_category = sample(c("Low", "Normal", "High", NA), size = n, replace = TRUE), disease = ifelse(hemoglobin_count < 15, "Yes", "No") ) # Initialize my_recipe <- recipe(disease ~ ., data = d) # Create recipe my_recipe <- my_recipe %>% hcai_impute() my_recipe#> Data Recipe #> #> Inputs: #> #> role #variables #> outcome 1 #> predictor 4 #> #> Operations: #> #> Mean Imputation for all_numeric(), -all_outcomes() #> Filling NA with missing for all_nominal(), -all_outcomes()# Train recipe trained_recipe <- prep(my_recipe, training = d) # Apply recipe data_modified <- bake(trained_recipe, new_data = d) missingness(data_modified)#> # A tibble: 5 x 2 #> variable percent_missing #> * <chr> <dbl> #> 1 patient_id 0 #> 2 age 0 #> 3 hemoglobin_count 0 #> 4 hemoglobin_category 0 #> 5 disease 0# Specify methods: my_recipe <- my_recipe %>% hcai_impute(numeric_method = "bagimpute", nominal_method = "locfimpute") my_recipe#> Data Recipe #> #> Inputs: #> #> role #variables #> outcome 1 #> predictor 4 #> #> Operations: #> #> Mean Imputation for all_numeric(), -all_outcomes() #> Filling NA with missing for all_nominal(), -all_outcomes() #> Bagged tree imputation for all_numeric(), -all_outcomes() #> LOCF Imputation for all_nominal(), -all_outcomes()# Specify methods and params: my_recipe <- my_recipe %>% hcai_impute(numeric_method = "knnimpute", numeric_params = list(knn_K = 4))#>my_recipe#> Data Recipe #> #> Inputs: #> #> role #variables #> outcome 1 #> predictor 4 #> #> Operations: #> #> Mean Imputation for all_numeric(), -all_outcomes() #> Filling NA with missing for all_nominal(), -all_outcomes() #> Bagged tree imputation for all_numeric(), -all_outcomes() #> LOCF Imputation for all_nominal(), -all_outcomes() #> K-nearest neighbor imputation for all_numeric(), -all_outcomes() #> Filling NA with missing for all_nominal(), -all_outcomes()