Pivot multiple rows per observation to one row with multiple columns
pivot(d, grain, spread, fill, fun = sum, missing_fill = NA, extra_cols)
d | data frame |
---|---|
grain | Column that defines rows. Unquoted. |
spread | Column that will become multiple columns. Unquoted. |
fill | Column to be used to fill the values of cells in the output,
perhaps after aggregation by |
fun | Function for aggregation, defaults to |
missing_fill | Value to fill for combinations of grain and spread that are not present. Defaults to NA, but 0 may be useful as well. |
extra_cols | Values of |
A tibble data frame with one row for each unique value of
grain
, and one column for each unique value of spread
plus
one column for the entries in grain.
Entries in the tibble are defined by the fill column. Combinations of
grain
x spread
that are not present in d
will be
filled in with missing_fill
. If there are grain
x
spread
pairs that appear more than once in d, they will be
aggregated by fun
.
pivot
is useful when you want to change the grain of your
data, for example from the procedure grain to the patient grain. In that
example, each patient might have 0, 1, or more medications. To make a
patient-level table, we need a column for each medication, which is what it
means to make a wide table. The fill
argument dictates what to put
in each of the medication columns, e.g. the dose the patient got.
fill
defaults to "1", as an indicator variable. If any patients have
multiple rows for the same medication (say they recieved a med more than
once), we need a way to deal with that, which is what the fun
argument handles. By default it uses sum
, so if fill
is left
as its default, the count of instances for each patient will be used.
meds <- tibble::tibble( patient_id = c("A", "A", "A", "B"), medication = c("zoloft", "asprin", "lipitor", "asprin"), pills_per_day = c(1, 8, 2, 4) ) meds#> # A tibble: 4 x 3 #> patient_id medication pills_per_day #> <chr> <chr> <dbl> #> 1 A zoloft 1 #> 2 A asprin 8 #> 3 A lipitor 2 #> 4 B asprin 4# Number of pills of each medication each patient gets: pivot( d = meds, grain = patient_id, spread = medication, fill = pills_per_day, missing_fill = 0 )#> # A tibble: 2 x 4 #> patient_id medication_asprin medication_lipitor medication_zoloft #> <chr> <dbl> <dbl> <dbl> #> 1 A 8 2 1 #> 2 B 4 0 0bills <- tibble::tibble( patient_id = rep(c("A", "B"), each = 4), dept_id = rep(c("ED", "ICU"), times = 4), charge = runif(8, 0, 1e4), date = as.Date("2024-12-25") - sample(0:2, 8, TRUE) ) bills#> # A tibble: 8 x 4 #> patient_id dept_id charge date #> <chr> <chr> <dbl> <date> #> 1 A ED 3823. 2024-12-25 #> 2 A ICU 8641. 2024-12-24 #> 3 A ED 7990. 2024-12-25 #> 4 A ICU 201. 2024-12-25 #> 5 B ED 5447. 2024-12-23 #> 6 B ICU 3750. 2024-12-24 #> 7 B ED 7808. 2024-12-23 #> 8 B ICU 9957. 2024-12-25# Total charges per patient x department: pivot(bills, patient_id, dept_id, charge, sum)#> # A tibble: 2 x 3 #> patient_id dept_id_ED dept_id_ICU #> <chr> <dbl> <dbl> #> 1 A 11814. 8842. #> 2 B 13255. 13707.# Count of charges per patient x day: pivot(bills, patient_id, date)#>#>#> # A tibble: 2 x 4 #> patient_id `date_2024-12-23` `date_2024-12-24` `date_2024-12-25` #> <chr> <int> <int> <int> #> 1 A NA 1 3 #> 2 B 2 1 1# Can provide a custom function to fun, which will take fill as input. # Get the difference between the greatest and smallest charge in each # department for each patient and format it as currency. pivot(d = bills, grain = patient_id, spread = dept_id, fill = charge, fun = function(x) paste0("$", round(max(x) - min(x), 2)) )#> # A tibble: 2 x 3 #> patient_id dept_id_ED dept_id_ICU #> <chr> <chr> <chr> #> 1 A $4166.72 $8440.26 #> 2 B $2361.24 $6207.09