This function replaces given missingness values with NA in a given dataframe or tibble. Numeric vectors that were originally loaded as character or factor vectors (because of missingness values in the column), are also converted to numeric vectors when values are replaced.

make_na(d, to_replace, drop_levels = TRUE)

Arguments

d

A dataframe or tibble

to_replace

A value or vector of values that will be replaced with NA

drop_levels

If TRUE (default) unused factor levels are dropped

Value

A tibble where the missing value/values is/are replaced with NA, columns that only have numbers left are coerced to numeric type

Examples

dat <- data.frame(gender = c("male", "male", "female", "male", "missing"), name = c("Paul", "Jim", "Sarah", "missing", "Alex"), weight = c(139, 0, 193, 158, 273)) # Replace "missing" in `dat` make_na(dat, "missing")
#> # A tibble: 5 x 3 #> gender name weight #> <fct> <fct> <dbl> #> 1 male Paul 139 #> 2 male Jim 0 #> 3 female Sarah 193 #> 4 male <NA> 158 #> 5 <NA> Alex 273
# If there are multiple missing values, pass them through a vector. dat <- data.frame(gender = c("male", "??", "female", "male", "NULL"), age = c(64, 52, 75, "NULL", 70), weight = c(139, 0, 193, "??", 273), stringsAsFactors = FALSE) make_na(dat, c("??", "NULL"))
#> # A tibble: 5 x 3 #> gender age weight #> <chr> <dbl> <dbl> #> 1 male 64 139 #> 2 <NA> 52 0 #> 3 female 75 193 #> 4 male NA NA #> 5 <NA> 70 273
# Run `missingness()` to find possible missingness values in `dat`. It will # suggest the default implementation of `make_na` to replace all found # missingness values (the suggested default implementation for this example # is `make_na(dat, c("??", "NULL"))`). missingness(dat)
#> Warning: Found these strings that may represent missing values: "??" and "NULL". If they do represent missingness, replace them with NA with: `make_na(dat, c("??", "NULL"))`
#> # A tibble: 3 x 2 #> variable percent_missing #> * <chr> <dbl> #> 1 gender 0 #> 2 age 0 #> 3 weight 0
make_na(dat, c("??", "NULL"))
#> # A tibble: 5 x 3 #> gender age weight #> <chr> <dbl> <dbl> #> 1 male 64 139 #> 2 <NA> 52 0 #> 3 female 75 193 #> 4 male NA NA #> 5 <NA> 70 273
# Note: In this last example, `age` should be loaded as a numeric vector, but # since "NULL" is present, it is stored as a character vector. When "NULL" is # replaced, `age` will be converted to a numeric vector.