library(bupaverse)
library(dplyr)
In order to easily manipulate logs, well-known dplyr-verbs have been adapted. This page serves as a general introduction of the wrangling verbs. Their usage is illustrated throughout the documentation in Manipulation, Analysis and Visualization.
Using the group_by()
function, event logs can be grouped
according to (a set of) variables, such that all further computations
happen for each of these different groups.
In the next example, the number of cases are computed for each value of vehicleclass.
%>%
traffic_fines group_by(vehicleclass) %>%
n_cases()
## # A tibble: 4 × 2
## vehicleclass n_cases
## <chr> <int>
## 1 A 9973
## 2 C 21
## 3 M 6
## 4 <NA> 10000
For specific groupings, some auxiliary functions are available.
group_by_case
- group by casesgroup_by_activity
- group by activity typesgroup_by_resource
- group by resourcesgroup_by_activity_resource
- group by activity resource
pairgroup_by_activity_instance
- group by activity
instances.For example, the number of cases in which a specific resource occurs, can be computed as follows:
%>%
sepsis %>%
group_by_resource n_cases()
## # A tibble: 26 × 2
## resource n_cases
## <fct> <int>
## 1 ? 294
## 2 A 985
## 3 B 1013
## 4 C 1050
## 5 D 46
## 6 E 782
## 7 F 200
## 8 G 147
## 9 H 50
## 10 I 118
## # ℹ 16 more rows
Note that each of the descriptive metrics discussed here can be rewritten
using these lower-level functions. The example above is equal to the
resource_involvement
metric at case-level.
When you want to group on a combination of mapping variables, for
example, for each combination of case and activity,
you can use group_by_ids()
. The following examples counts
the number of events per case and per activity:
%>%
patients group_by_ids(case_id, activity_id) %>%
n_events()
## # A tibble: 2,721 × 3
## patient handling n_events
## <chr> <fct> <int>
## 1 1 Blood test 2
## 2 1 Check-out 2
## 3 1 Discuss Results 2
## 4 1 MRI SCAN 2
## 5 1 Registration 2
## 6 1 Triage and Assessment 2
## 7 10 Check-out 2
## 8 10 Discuss Results 2
## 9 10 Registration 2
## 10 10 Triage and Assessment 2
## # ℹ 2,711 more rows
Note that the arguments of group_by_ids()
are not the
variable names of case (patient) and activity
(handling) columns, but unquoted mapping id-functions. You can
thus use this function while being agnostic of the precise variable
names.
When a grouping is no longer needed, it can be removed using
ungroup_eventlog()
.
You can use mutate()
to add new variables to an event
log, possibly by using existing variables. In the next example, the
total amount of lacticacid is computed for each case. Read more.
%>%
sepsis group_by_case() %>%
mutate(total_lacticacid = sum(lacticacid, na.rm = T))
## # Groups: [case_id]
## Grouped # Log of 15214 events consisting of:
## 846 traces
## 1050 cases
## 15214 instances of 16 activities
## 26 resources
## Events occurred from 2013-11-07 08:18:29 until 2015-06-05 12:25:11
##
## # Variables were mapped as follows:
## Case identifier: case_id
## Activity identifier: activity
## Resource identifier: resource
## Activity instance identifier: activity_instance_id
## Timestamp: timestamp
## Lifecycle transition: lifecycle
##
## # A tibble: 15,214 × 35
## case_id activity lifecycle resource timestamp age crp diagnose
## <chr> <fct> <fct> <fct> <dttm> <dbl> <dbl> <chr>
## 1 A ER Regis… complete A 2014-10-22 11:15:41 85 NA A
## 2 A Leucocyt… complete B 2014-10-22 11:27:00 NA NA <NA>
## 3 A CRP complete B 2014-10-22 11:27:00 NA 210 <NA>
## 4 A LacticAc… complete B 2014-10-22 11:27:00 NA NA <NA>
## 5 A ER Triage complete C 2014-10-22 11:33:37 NA NA <NA>
## 6 A ER Sepsi… complete A 2014-10-22 11:34:00 NA NA <NA>
## 7 A IV Liquid complete A 2014-10-22 14:03:47 NA NA <NA>
## 8 A IV Antib… complete A 2014-10-22 14:03:47 NA NA <NA>
## 9 A Admissio… complete D 2014-10-22 14:13:19 NA NA <NA>
## 10 A CRP complete B 2014-10-24 09:00:00 NA 1090 <NA>
## # ℹ 15,204 more rows
## # ℹ 27 more variables: diagnosticartastrup <lgl>, diagnosticblood <lgl>,
## # diagnosticecg <lgl>, diagnosticic <lgl>, diagnosticlacticacid <lgl>,
## # diagnosticliquor <lgl>, diagnosticother <lgl>, diagnosticsputum <lgl>,
## # diagnosticurinaryculture <lgl>, diagnosticurinarysediment <lgl>,
## # diagnosticxthorax <lgl>, disfuncorg <lgl>, hypotensie <lgl>, hypoxie <lgl>,
## # infectionsuspected <lgl>, infusion <lgl>, lacticacid <dbl>, …
Generic filtering of events can be done using filter()
,
which takes an event log and any number of logical conditions. The
example below filters events where “C” is the vehicle class and an
amount greater than 300. Read
more..
%>%
traffic_fines filter(vehicleclass == "C", amount > 300)
## # Log of 20 events consisting of:
## 1 trace
## 20 cases
## 20 instances of 1 activity
## 10 resources
## Events occurred from 2006-08-10 until 2008-02-09
##
## # Variables were mapped as follows:
## Case identifier: case_id
## Activity identifier: activity
## Resource identifier: resource
## Activity instance identifier: activity_instance_id
## Timestamp: timestamp
## Lifecycle transition: lifecycle
##
## # A tibble: 20 × 18
## case_id activity lifecycle resource timestamp amount article
## <chr> <fct> <fct> <fct> <dttm> <chr> <dbl>
## 1 A10060 Create Fine complete 541 2007-03-08 00:00:00 36.0 157
## 2 A10497 Create Fine complete 558 2007-03-30 00:00:00 36.0 157
## 3 A10818 Create Fine complete 561 2007-04-08 00:00:00 36.0 157
## 4 A11707 Create Fine complete 550 2007-04-24 00:00:00 36.0 157
## 5 A11936 Create Fine complete 557 2007-04-29 00:00:00 36.0 157
## 6 A12073 Create Fine complete 557 2007-05-03 00:00:00 36.0 157
## 7 A1408 Create Fine complete 559 2006-08-20 00:00:00 35.0 157
## 8 A14883 Create Fine complete 561 2007-06-29 00:00:00 36.0 157
## 9 A17130 Create Fine complete 541 2007-07-15 00:00:00 36.0 157
## 10 A1815 Create Fine complete 563 2006-08-10 00:00:00 35.0 157
## 11 A19109 Create Fine complete 556 2007-07-17 00:00:00 36.0 157
## 12 A23000 Create Fine complete 550 2007-12-29 00:00:00 36.0 157
## 13 A24247 Create Fine complete 561 2007-12-03 00:00:00 36.0 157
## 14 A24366 Create Fine complete 541 2008-02-09 00:00:00 36.0 157
## 15 A24634 Create Fine complete 537 2007-11-21 00:00:00 36.0 157
## 16 A24942 Create Fine complete 561 2007-12-30 00:00:00 36.0 157
## 17 A25581 Create Fine complete 559 2007-11-23 00:00:00 36.0 157
## 18 A25599 Create Fine complete 559 2007-11-24 00:00:00 36.0 157
## 19 A26099 Create Fine complete 559 2007-12-09 00:00:00 36.0 157
## 20 A26277 Create Fine complete 538 2008-01-07 00:00:00 36.0 157
## # ℹ 11 more variables: dismissal <chr>, expense <chr>, lastsent <chr>,
## # matricola <dbl>, notificationtype <chr>, paymentamount <dbl>, points <dbl>,
## # totalpaymentamount <chr>, vehicleclass <chr>, activity_instance_id <chr>,
## # .order <int>
Variables on a event log can be selected using
select()
. By default, select()
will always
make sure that the mapping-variables are retained. Otherwise, it would
no longer function as an eventlog
object.
%>%
traffic_fines select(vehicleclass)
## # Log of 34724 events consisting of:
## 44 traces
## 10000 cases
## 34724 instances of 11 activities
## 16 resources
## Events occurred from 2006-06-17 until 2012-03-26
##
## # Variables were mapped as follows:
## Case identifier: case_id
## Activity identifier: activity
## Resource identifier: resource
## Activity instance identifier: activity_instance_id
## Timestamp: timestamp
## Lifecycle transition: lifecycle
##
## # A tibble: 34,724 × 8
## vehicleclass case_id activity activity_instance_id timestamp
## <chr> <chr> <fct> <chr> <dttm>
## 1 A A1 Create Fine 1 2006-07-24 00:00:00
## 2 <NA> A1 Send Fine 2 2006-12-05 00:00:00
## 3 A A100 Create Fine 3 2006-08-02 00:00:00
## 4 <NA> A100 Send Fine 4 2006-12-12 00:00:00
## 5 <NA> A100 Insert Fine No… 5 2007-01-15 00:00:00
## 6 <NA> A100 Add penalty 6 2007-03-16 00:00:00
## 7 <NA> A100 Send for Credi… 7 2009-03-30 00:00:00
## 8 A A10000 Create Fine 8 2007-03-09 00:00:00
## 9 <NA> A10000 Send Fine 9 2007-07-17 00:00:00
## 10 <NA> A10000 Insert Fine No… 10 2007-08-02 00:00:00
## # ℹ 34,714 more rows
## # ℹ 3 more variables: resource <fct>, lifecycle <fct>, .order <int>
By setting the argument force_df = TRUE
, the
mapping-variables will not be retained, and the output will be a
data.frame, and not an eventlog
object. Note that doing so
will hold even in the case that all mapping variables are selected.
%>%
traffic_fines select(case_id, vehicleclass, amount, force_df = TRUE)
## # A tibble: 34,724 × 3
## case_id vehicleclass amount
## <chr> <chr> <chr>
## 1 A1 A 35.0
## 2 A1 <NA> <NA>
## 3 A100 A 35.0
## 4 A100 <NA> <NA>
## 5 A100 <NA> <NA>
## 6 A100 <NA> 71.5
## 7 A100 <NA> <NA>
## 8 A10000 A 36.0
## 9 A10000 <NA> <NA>
## 10 A10000 <NA> <NA>
## # ℹ 34,714 more rows
Similar to group_by_ids()
, select_ids()
can
be used to select the mapping variables.
%>%
patients select_ids(case_id, activity_id)
## # A tibble: 5,442 × 2
## patient handling
## <chr> <fct>
## 1 1 Registration
## 2 2 Registration
## 3 3 Registration
## 4 4 Registration
## 5 5 Registration
## 6 6 Registration
## 7 7 Registration
## 8 8 Registration
## 9 9 Registration
## 10 10 Registration
## # ℹ 5,432 more rows
Note again how the arguments are unquoted id-functions instead of raw
variable names. The result of select_ids()
will
always result in a data.frame
object, as
typically not all id’s in the mapping will be selected.
Event data can be sorted using the arrange()
.
desc()
argument can be used to sort descending on an
attribute.
#sort descending on time
%>%
patients arrange(desc(time))
## # Log of 5442 events consisting of:
## 7 traces
## 500 cases
## 2721 instances of 7 activities
## 7 resources
## Events occurred from 2017-01-02 11:41:53 until 2018-05-05 07:16:02
##
## # Variables were mapped as follows:
## Case identifier: patient
## Activity identifier: handling
## Resource identifier: employee
## Activity instance identifier: handling_id
## Timestamp: time
## Lifecycle transition: registration_type
##
## # A tibble: 5,442 × 7
## handling patient employee handling_id registration_type time
## <fct> <chr> <fct> <chr> <fct> <dttm>
## 1 Triage an… 500 r2 1000 complete 2018-05-05 07:16:02
## 2 Discuss R… 495 r6 2229 complete 2018-05-05 02:49:57
## 3 X-Ray 498 r5 1734 complete 2018-05-05 01:34:30
## 4 Triage an… 500 r2 1000 start 2018-05-04 23:53:27
## 5 Triage an… 499 r2 999 complete 2018-05-04 23:53:27
## 6 Discuss R… 495 r6 2229 start 2018-05-04 23:50:05
## 7 Discuss R… 489 r6 2223 complete 2018-05-04 23:50:05
## 8 X-Ray 498 r5 1734 start 2018-05-04 21:50:07
## 9 X-Ray 497 r5 1733 complete 2018-05-04 21:50:07
## 10 Discuss R… 489 r6 2223 start 2018-05-04 20:24:44
## # ℹ 5,432 more rows
## # ℹ 1 more variable: .order <int>
Read more:
Copyright © 2023 bupaR - Hasselt University