library(bupaverse)
library(dplyr)
Enriching an event log with calculated metrics can be done using
augment()
. For example, consider trace_length()
.
%>%
traffic_fines trace_length(level = "case")
## # A tibble: 10,000 × 2
## case_id absolute
## <chr> <int>
## 1 A10249 9
## 2 A10338 9
## 3 A10619 9
## 4 A10858 9
## 5 A12027 9
## 6 A12414 9
## 7 A13217 9
## 8 A1327 9
## 9 A13617 9
## 10 A13984 9
## # ℹ 9,990 more rows
Feeding the resulting table back to traffic_fines
with
augment()
makes the trace length metric available as a case
attribute for further analysis.
%>%
traffic_fines trace_length(level = "case") %>%
augment(traffic_fines) %>%
glimpse()
## Rows: 34,724
## Columns: 19
## $ case_id <chr> "A1", "A1", "A100", "A100", "A100", "A100", "A100…
## $ activity <fct> Create Fine, Send Fine, Create Fine, Send Fine, I…
## $ lifecycle <fct> complete, complete, complete, complete, complete,…
## $ resource <fct> 561, NA, 561, NA, NA, NA, NA, 561, NA, NA, NA, NA…
## $ timestamp <dttm> 2006-07-24, 2006-12-05, 2006-08-02, 2006-12-12, …
## $ amount <chr> "35.0", NA, "35.0", NA, NA, "71.5", NA, "36.0", N…
## $ article <dbl> 157, NA, 157, NA, NA, NA, NA, 157, NA, NA, NA, NA…
## $ dismissal <chr> "NIL", NA, "NIL", NA, NA, NA, NA, "NIL", NA, NA, …
## $ expense <chr> NA, "11.0", NA, "11.0", NA, NA, NA, NA, "13.0", N…
## $ lastsent <chr> NA, NA, NA, NA, "P", NA, NA, NA, NA, "P", NA, NA,…
## $ matricola <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ notificationtype <chr> NA, NA, NA, NA, "P", NA, NA, NA, NA, "P", NA, NA,…
## $ paymentamount <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 870, …
## $ points <dbl> 0, NA, 0, NA, NA, NA, NA, 0, NA, NA, NA, NA, 0, N…
## $ totalpaymentamount <chr> "0.0", NA, "0.0", NA, NA, NA, NA, "0.0", NA, NA, …
## $ vehicleclass <chr> "A", NA, "A", NA, NA, NA, NA, "A", NA, NA, NA, NA…
## $ activity_instance_id <chr> "1", "2", "3", "4", "5", "6", "7", "8", "9", "10"…
## $ .order <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15…
## $ absolute <int> 2, 2, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6…
Using the prefix
argument, you can add a descriptive
prefix to the name of the new variable. In the current example, where
the variable is called absolute, it might be useful to add the
prefix trace_length.
%>%
traffic_fines trace_length(level = "case") %>%
augment(traffic_fines, prefix = "trace_length") %>%
glimpse()
## Rows: 34,724
## Columns: 19
## $ case_id <chr> "A1", "A1", "A100", "A100", "A100", "A100", "A10…
## $ activity <fct> Create Fine, Send Fine, Create Fine, Send Fine, …
## $ lifecycle <fct> complete, complete, complete, complete, complete…
## $ resource <fct> 561, NA, 561, NA, NA, NA, NA, 561, NA, NA, NA, N…
## $ timestamp <dttm> 2006-07-24, 2006-12-05, 2006-08-02, 2006-12-12,…
## $ amount <chr> "35.0", NA, "35.0", NA, NA, "71.5", NA, "36.0", …
## $ article <dbl> 157, NA, 157, NA, NA, NA, NA, 157, NA, NA, NA, N…
## $ dismissal <chr> "NIL", NA, "NIL", NA, NA, NA, NA, "NIL", NA, NA,…
## $ expense <chr> NA, "11.0", NA, "11.0", NA, NA, NA, NA, "13.0", …
## $ lastsent <chr> NA, NA, NA, NA, "P", NA, NA, NA, NA, "P", NA, NA…
## $ matricola <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ notificationtype <chr> NA, NA, NA, NA, "P", NA, NA, NA, NA, "P", NA, NA…
## $ paymentamount <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 870,…
## $ points <dbl> 0, NA, 0, NA, NA, NA, NA, 0, NA, NA, NA, NA, 0, …
## $ totalpaymentamount <chr> "0.0", NA, "0.0", NA, NA, NA, NA, "0.0", NA, NA,…
## $ vehicleclass <chr> "A", NA, "A", NA, NA, NA, NA, "A", NA, NA, NA, N…
## $ activity_instance_id <chr> "1", "2", "3", "4", "5", "6", "7", "8", "9", "10…
## $ .order <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 1…
## $ trace_length_absolute <int> 2, 2, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, …
Some metrics return several variables with information. Say you want to add information on the processing time of each activity to the data.
%>%
patients processing_time(level = "activity", units = "hours")
## # A tibble: 7 × 11
## handling min q1 mean median q3 max st_dev iqr total
## <fct> <drtn> <drt> <drt> <drtn> <drt> <drt> <dbl> <dbl> <drt>
## 1 Registration 0.828… 2.0… 2.7… 2.71… 3.4… 5.6… 0.954 1.33 1376…
## 2 Triage and Assessment 5.868… 11.3… 13.1… 13.34… 15.0… 18.8… 2.76 3.68 6552…
## 3 Discuss Results 1.333… 2.3… 2.7… 2.77… 3.2… 4.5… 0.628 0.906 1374…
## 4 Check-out 0.667… 1.6… 2.0… 2.07… 2.4… 3.8… 0.620 0.860 1014…
## 5 X-Ray 2.294… 3.8… 4.8… 4.79… 5.6… 8.1… 1.28 1.76 1264…
## 6 Blood test 3.089… 4.7… 5.5… 5.46… 6.2… 8.1… 1.06 1.51 1311…
## 7 MRI SCAN 2.489… 3.6… 4.1… 4.09… 4.6… 5.9… 0.735 1.09 979…
## # ℹ 1 more variable: relative_frequency <dbl>
Calling augment
without any further arguments will add
all columns, from min until relative_frequency to the
data.
%>%
patients processing_time(level = "activity", units = "hours") %>%
augment(patients) %>%
glimpse()
## Rows: 5,442
## Columns: 17
## $ handling <fct> Registration, Registration, Registration, Registrat…
## $ patient <chr> "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", …
## $ employee <fct> r1, r1, r1, r1, r1, r1, r1, r1, r1, r1, r1, r1, r1,…
## $ handling_id <chr> "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", …
## $ registration_type <fct> start, start, start, start, start, start, start, st…
## $ time <dttm> 2017-01-02 11:41:53, 2017-01-02 11:41:53, 2017-01-…
## $ .order <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, …
## $ min <drtn> 0.8288889 hours, 0.8288889 hours, 0.8288889 hours,…
## $ q1 <drtn> 2.070417 hours, 2.070417 hours, 2.070417 hours, 2.…
## $ mean <drtn> 2.7538 hours, 2.7538 hours, 2.7538 hours, 2.7538 h…
## $ median <drtn> 2.713611 hours, 2.713611 hours, 2.713611 hours, 2.…
## $ q3 <drtn> 3.402014 hours, 3.402014 hours, 3.402014 hours, 3.…
## $ max <drtn> 5.634722 hours, 5.634722 hours, 5.634722 hours, 5.…
## $ st_dev <dbl> 0.9539039, 0.9539039, 0.9539039, 0.9539039, 0.95390…
## $ iqr <dbl> 1.331597, 1.331597, 1.331597, 1.331597, 1.331597, 1…
## $ total <drtn> 1376.9 hours, 1376.9 hours, 1376.9 hours, 1376.9 h…
## $ relative_frequency <dbl> 0.183756, 0.183756, 0.183756, 0.183756, 0.183756, 0…
Using the columns
argument we can specify a selection of
columns that we want to use for augmenting the log. For example, say we
are only interested in the mean and median processing
time. Let’s also add a descriptive prefix to these columns.
%>%
patients processing_time(level = "activity", units = "hours") %>%
augment(patients, columns = c("mean","median"), prefix = "processing_time") %>%
glimpse()
## Rows: 5,442
## Columns: 9
## $ handling <fct> Registration, Registration, Registration, Regis…
## $ patient <chr> "1", "2", "3", "4", "5", "6", "7", "8", "9", "1…
## $ employee <fct> r1, r1, r1, r1, r1, r1, r1, r1, r1, r1, r1, r1,…
## $ handling_id <chr> "1", "2", "3", "4", "5", "6", "7", "8", "9", "1…
## $ registration_type <fct> start, start, start, start, start, start, start…
## $ time <dttm> 2017-01-02 11:41:53, 2017-01-02 11:41:53, 2017…
## $ .order <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, …
## $ processing_time_mean <drtn> 2.7538 hours, 2.7538 hours, 2.7538 hours, 2.75…
## $ processing_time_median <drtn> 2.713611 hours, 2.713611 hours, 2.713611 hours…
When you want to add multiple metrics, it is imperative to save intermediate updates of the data. Consider the example below.
%>%
patients trace_length(level = "case") %>%
augment(patients, prefix = "trace_length") %>%
trace_coverage(level = "case") %>%
augment(patients, prefix = "trace_frequency") %>%
glimpse()
## Rows: 5,442
## Columns: 10
## $ handling <fct> Registration, Registration, Registration, Reg…
## $ patient <chr> "1", "2", "3", "4", "5", "6", "7", "8", "9", …
## $ employee <fct> r1, r1, r1, r1, r1, r1, r1, r1, r1, r1, r1, r…
## $ handling_id <chr> "1", "2", "3", "4", "5", "6", "7", "8", "9", …
## $ registration_type <fct> start, start, start, start, start, start, sta…
## $ time <dttm> 2017-01-02 11:41:53, 2017-01-02 11:41:53, 20…
## $ .order <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14…
## $ trace_frequency_trace <chr> "Registration,Triage and Assessment,Blood tes…
## $ trace_frequency_absolute <int> 234, 258, 234, 234, 258, 234, 234, 258, 258, …
## $ trace_frequency_relative <dbl> 0.468, 0.516, 0.468, 0.468, 0.516, 0.468, 0.4…
As you can see only the trace_coverage()
values of the second augment are added, while the first augment is lost.
This is because the patients
data set did not get updated
after the first augment()
call. The proper way would be as
follows.
%>%
patients trace_length(level = "case") %>%
augment(patients, prefix = "trace_length") -> patients
%>%
patients trace_coverage(level = "case") %>%
augment(patients, prefix = "trace_frequency") %>%
glimpse()
## Rows: 5,442
## Columns: 11
## $ handling <fct> Registration, Registration, Registration, Reg…
## $ patient <chr> "1", "2", "3", "4", "5", "6", "7", "8", "9", …
## $ employee <fct> r1, r1, r1, r1, r1, r1, r1, r1, r1, r1, r1, r…
## $ handling_id <chr> "1", "2", "3", "4", "5", "6", "7", "8", "9", …
## $ registration_type <fct> start, start, start, start, start, start, sta…
## $ time <dttm> 2017-01-02 11:41:53, 2017-01-02 11:41:53, 20…
## $ .order <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14…
## $ trace_length_absolute <int> 6, 5, 6, 6, 5, 6, 6, 5, 5, 5, 5, 6, 6, 5, 6, …
## $ trace_frequency_trace <chr> "Registration,Triage and Assessment,Blood tes…
## $ trace_frequency_absolute <int> 234, 258, 234, 234, 258, 234, 234, 258, 258, …
## $ trace_frequency_relative <dbl> 0.468, 0.516, 0.468, 0.468, 0.516, 0.468, 0.4…
Read more:
Copyright © 2023 bupaR - Hasselt University