bupaR
knows 2 main object classes: eventlog
and activitylog.
Both are special types of a
data.frame
object. Furthermore, there is the overarching
object class log
. The object class log
is used
by functions where a distinction between the two classes is not
relevant. It is only used as a higher-level classification of the
eventlog
and activitylog
objects - it cannot
stand on its own. That is, objects which have just the class
log
cannot exist, they must have one of the subclasses as
well.
The defining characteristics of a log
are stored in
regular variables, of which the names can be obtained with the
mapping()
function.
mapping(patients)
## Case identifier: patient
## Activity identifier: handling
## Resource identifier: employee
## Activity instance identifier: handling_id
## Timestamp: time
## Lifecycle transition: registration_type
mapping(patients_act)
## Case identifier: patient
## Activity identifier: handling
## Resource identifier: employee
## Timestamps: start, complete
Note that both eventlog
and activitylog
have some mapping-elements in common:
While other mapping-elements are slightly different:
eventlog
. For activitylog
, each row is an
activity instance by definitioneventlog
consist of a
single column. For activitylog
, it consist of multiple
columns. (At least start- and complete status are required, although
they can contain NA’s). These are stored under the
timestamps
mapping element.Note that there are 2 classes for the mapping, one for
eventlog
and one for activitylog
. (Note also
that the eventlog_mapping()
has a dedicated
print()
, while activitylog
has not (yet), and
prints just a regular list.)
Individual mapping-variables can be obtained with the dedicated id functions. They work on both the logs itself, and on the mappings.
activity_id(patients)
## [1] "handling"
activity_id(patients_act)
## [1] "handling"
<- mapping(patients)
mapping_event <- mapping(patients_act)
mapping_act activity_id(mapping_event)
## [1] "handling"
activity_id(mapping_act)
## [1] "handling"
During data manipulation, it can sometimes happen (or sometimes is
necessary) that the log is at some point transferred to a regular
data.frame
for some operations. If the ultimate output of
the function should be once again a log
object (and not a
visual or summary table), the mapping can be used to recuperate the
original mapping. This can be done using re_map()
.
<- as.data.frame(patients)
patients_df class(patients_df)
## [1] "data.frame"
<- re_map(patients_df, mapping_event)
patients_log class(patients_log)
## [1] "eventlog" "log" "tbl_df" "tbl" "data.frame"
re_map()
recognizes the class of the mapping, and thus
works for both activitylog
and eventlog
mappings. It will always return to the original type. (I.e. if the
mapping originates from an activitylog
object, it will
result once again in an activitylog
object.) It can never
be used to convert activitylog
to eventlog
, or
vice versa.
While re_map()
is exported by bupaR
, it is
primarily for internal use. Only for more advanced use of
bupaR
, it can be useful for the end-user.
Note that functions that are not exported can always be used using
the :::
instead of the ::
operator. For
instance, we can use the non-exported activity_id_()
function outside of bupaR
as follows:
:::activity_id_(patients) bupaR
## handling
While you should typically not need these function outside of
bupaR
, except for perhaps developing or testing some code
interactively, we will use the :::
notation in this manual
whenever we refer to internal functions.
activity_id_()
is a variant of
activity_id()
. Only instead of returning a chr
object, it returns a symbol. This symbol is useful when you want to use
the mapping variable while programming.
For example, suppose you want to filter the patients
log, only for patient == 1. But you don’t know that the
case_id
is “patient”, so you use the function to get the
case_id.
The following will not work.
%>%
patients filter(case_id(patients) == 1)
## EMPTY EVENT LOG
## # A tibble: 0 × 7
## # ℹ 7 variables: handling <fct>, patient <chr>, employee <fct>,
## # handling_id <chr>, registration_type <fct>, time <dttm>, .order <int>
just as the following will not work.
%>%
patients filter("patient" == 1)
## EMPTY EVENT LOG
## # A tibble: 0 × 7
## # ℹ 7 variables: handling <fct>, patient <chr>, employee <fct>,
## # handling_id <chr>, registration_type <fct>, time <dttm>, .order <int>
In order to successfully do this, we could use the symbol:
%>%
patients filter(!!bupaR:::case_id_(patients) == 1)
## # Log of 12 events consisting of:
## 1 trace
## 1 case
## 6 instances of 6 activities
## 6 resources
## Events occurred from 2017-01-02 11:41:53 until 2017-01-09 19:45:45
##
## # Variables were mapped as follows:
## Case identifier: patient
## Activity identifier: handling
## Resource identifier: employee
## Activity instance identifier: handling_id
## Timestamp: time
## Lifecycle transition: registration_type
##
## # A tibble: 12 × 7
## handling patient employee handling_id registration_type time
## <fct> <chr> <fct> <chr> <fct> <dttm>
## 1 Registrat… 1 r1 1 start 2017-01-02 11:41:53
## 2 Triage an… 1 r2 501 start 2017-01-02 12:40:20
## 3 Blood test 1 r3 1001 start 2017-01-05 08:59:04
## 4 MRI SCAN 1 r4 1238 start 2017-01-05 21:37:12
## 5 Discuss R… 1 r6 1735 start 2017-01-07 07:57:49
## 6 Check-out 1 r7 2230 start 2017-01-09 17:09:43
## 7 Registrat… 1 r1 1 complete 2017-01-02 12:40:20
## 8 Triage an… 1 r2 501 complete 2017-01-02 22:32:25
## 9 Blood test 1 r3 1001 complete 2017-01-05 14:34:27
## 10 MRI SCAN 1 r4 1238 complete 2017-01-06 01:54:23
## 11 Discuss R… 1 r6 1735 complete 2017-01-07 10:18:08
## 12 Check-out 1 r7 2230 complete 2017-01-09 19:45:45
## # ℹ 1 more variable: .order <int>
More on symbols and !!: https://adv-r.hadley.nz/quasiquotation.html
Alternatively, the following notation works as well.
%>%
patients filter(.data[[case_id(patients)]] == 1)
## # Log of 12 events consisting of:
## 1 trace
## 1 case
## 6 instances of 6 activities
## 6 resources
## Events occurred from 2017-01-02 11:41:53 until 2017-01-09 19:45:45
##
## # Variables were mapped as follows:
## Case identifier: patient
## Activity identifier: handling
## Resource identifier: employee
## Activity instance identifier: handling_id
## Timestamp: time
## Lifecycle transition: registration_type
##
## # A tibble: 12 × 7
## handling patient employee handling_id registration_type time
## <fct> <chr> <fct> <chr> <fct> <dttm>
## 1 Registrat… 1 r1 1 start 2017-01-02 11:41:53
## 2 Triage an… 1 r2 501 start 2017-01-02 12:40:20
## 3 Blood test 1 r3 1001 start 2017-01-05 08:59:04
## 4 MRI SCAN 1 r4 1238 start 2017-01-05 21:37:12
## 5 Discuss R… 1 r6 1735 start 2017-01-07 07:57:49
## 6 Check-out 1 r7 2230 start 2017-01-09 17:09:43
## 7 Registrat… 1 r1 1 complete 2017-01-02 12:40:20
## 8 Triage an… 1 r2 501 complete 2017-01-02 22:32:25
## 9 Blood test 1 r3 1001 complete 2017-01-05 14:34:27
## 10 MRI SCAN 1 r4 1238 complete 2017-01-06 01:54:23
## 11 Discuss R… 1 r6 1735 complete 2017-01-07 10:18:08
## 12 Check-out 1 r7 2230 complete 2017-01-09 19:45:45
## # ℹ 1 more variable: .order <int>
The .data here is a special command, a pronoun, that can be
used in dplyr
functions. More information here: https://adv-r.hadley.nz/quasiquotation.html
In bupaR
, the preference goes to the latter notation. It
has the advantage to be used in scripts both inside bupaR
as well as outside (whereas the !! notation only works with the
bupaR:::
prefix). It is also slightly easier to understand
than the workings of !!.
That said, the use of case_id_()
and
symbol(case_id())
is still widespread in
bupaR
, but the goal is to phase out this usage.
The following dplyr verbs have received methods for activity logs and event logs.
filter()
group_by()
arrange()
mutate()
select()
They will all return a proper log, i.e. there is no risk of losing the defined mapping.
Special attention has to be given to the following:
Conventionally, select()
will not ensure that the log
maintains the variables it needs to be considered a log. The select
methods for logs therefore will keep the listed variables
and the variables that define the event log.
The following code returns an eventlog
object with the
attribute oligurie, as well as the 6 variables needed to define
the event log (plus the .order variable, see further).
%>%
sepsis select(oligurie)
## # Log of 15214 events consisting of:
## 846 traces
## 1050 cases
## 15214 instances of 16 activities
## 26 resources
## Events occurred from 2013-11-07 08:18:29 until 2015-06-05 12:25:11
##
## # Variables were mapped as follows:
## Case identifier: case_id
## Activity identifier: activity
## Resource identifier: resource
## Activity instance identifier: activity_instance_id
## Timestamp: timestamp
## Lifecycle transition: lifecycle
##
## # A tibble: 15,214 × 8
## oligurie case_id activity activity_instance_id timestamp resource
## <lgl> <chr> <fct> <chr> <dttm> <fct>
## 1 FALSE A ER Regist… 1 2014-10-22 11:15:41 A
## 2 NA A Leucocytes 2 2014-10-22 11:27:00 B
## 3 NA A CRP 3 2014-10-22 11:27:00 B
## 4 NA A LacticAcid 4 2014-10-22 11:27:00 B
## 5 NA A ER Triage 5 2014-10-22 11:33:37 C
## 6 NA A ER Sepsis… 6 2014-10-22 11:34:00 A
## 7 NA A IV Liquid 7 2014-10-22 14:03:47 A
## 8 NA A IV Antibi… 8 2014-10-22 14:03:47 A
## 9 NA A Admission… 9 2014-10-22 14:13:19 D
## 10 NA A CRP 10 2014-10-24 09:00:00 B
## # ℹ 15,204 more rows
## # ℹ 2 more variables: lifecycle <fct>, .order <int>
This behavior can be turned by setting force_df = TRUE
.
In that case, the select will work just like a traditional
select()
, and the result will be a data.frame
,
no longer eventlog
.
%>%
sepsis select(oligurie, force_df = TRUE)
## # A tibble: 15,214 × 1
## oligurie
## <lgl>
## 1 FALSE
## 2 NA
## 3 NA
## 4 NA
## 5 NA
## 6 NA
## 7 NA
## 8 NA
## 9 NA
## 10 NA
## # ℹ 15,204 more rows
Because of this, you can select just the event log mapping
using select()
.
%>%
sepsis select()
## # Log of 15214 events consisting of:
## 846 traces
## 1050 cases
## 15214 instances of 16 activities
## 26 resources
## Events occurred from 2013-11-07 08:18:29 until 2015-06-05 12:25:11
##
## # Variables were mapped as follows:
## Case identifier: case_id
## Activity identifier: activity
## Resource identifier: resource
## Activity instance identifier: activity_instance_id
## Timestamp: timestamp
## Lifecycle transition: lifecycle
##
## # A tibble: 15,214 × 7
## case_id activity activity_instance_id timestamp resource lifecycle
## <chr> <fct> <chr> <dttm> <fct> <fct>
## 1 A ER Regis… 1 2014-10-22 11:15:41 A complete
## 2 A Leucocyt… 2 2014-10-22 11:27:00 B complete
## 3 A CRP 3 2014-10-22 11:27:00 B complete
## 4 A LacticAc… 4 2014-10-22 11:27:00 B complete
## 5 A ER Triage 5 2014-10-22 11:33:37 C complete
## 6 A ER Sepsi… 6 2014-10-22 11:34:00 A complete
## 7 A IV Liquid 7 2014-10-22 14:03:47 A complete
## 8 A IV Antib… 8 2014-10-22 14:03:47 A complete
## 9 A Admissio… 9 2014-10-22 14:13:19 D complete
## 10 A CRP 10 2014-10-24 09:00:00 B complete
## # ℹ 15,204 more rows
## # ℹ 1 more variable: .order <int>
If you want to select only specific eventlog
classifiers, you can use selects_ids()
. Because you would
typically not select all id’s (otherwise you can use
select()
), this will by default turn your object to a
data.frame
object.
%>%
sepsis ::select_ids(activity_id, case_id) bupaR
## # A tibble: 15,214 × 2
## activity case_id
## <fct> <chr>
## 1 ER Registration A
## 2 Leucocytes A
## 3 CRP A
## 4 LacticAcid A
## 5 ER Triage A
## 6 ER Sepsis Triage A
## 7 IV Liquid A
## 8 IV Antibiotics A
## 9 Admission NC A
## 10 CRP A
## # ℹ 15,204 more rows
Note how the different classifiers are defined: using the
_id()
functions, but without the brackets. And not using
characters.
While group_by()
is defined for logs, it should be noted
that it requires special methods for each function before that function
is “compatible” with grouped logs. Some utility functions for this do
however exist (see further).
There are some short cuts for typical groupings when programming in
bupaR
:
group_by_case()
group_by_activity()
group_by_activity_instance()
group_by_resource()
group_by_resource_activity()
%>%
patients group_by_case()
## # Groups: [patient]
## Grouped # Log of 5442 events consisting of:
## 7 traces
## 500 cases
## 2721 instances of 7 activities
## 7 resources
## Events occurred from 2017-01-02 11:41:53 until 2018-05-05 07:16:02
##
## # Variables were mapped as follows:
## Case identifier: patient
## Activity identifier: handling
## Resource identifier: employee
## Activity instance identifier: handling_id
## Timestamp: time
## Lifecycle transition: registration_type
##
## # A tibble: 5,442 × 7
## handling patient employee handling_id registration_type time
## <fct> <chr> <fct> <chr> <fct> <dttm>
## 1 Registrat… 1 r1 1 start 2017-01-02 11:41:53
## 2 Registrat… 2 r1 2 start 2017-01-02 11:41:53
## 3 Registrat… 3 r1 3 start 2017-01-04 01:34:05
## 4 Registrat… 4 r1 4 start 2017-01-04 01:34:04
## 5 Registrat… 5 r1 5 start 2017-01-04 16:07:47
## 6 Registrat… 6 r1 6 start 2017-01-04 16:07:47
## 7 Registrat… 7 r1 7 start 2017-01-05 04:56:11
## 8 Registrat… 8 r1 8 start 2017-01-05 04:56:11
## 9 Registrat… 9 r1 9 start 2017-01-06 05:58:54
## 10 Registrat… 10 r1 10 start 2017-01-06 05:58:54
## # ℹ 5,432 more rows
## # ℹ 1 more variable: .order <int>
is equivalent to
%>%
patients group_by(.data[[case_id(patients)]])
## # Groups: [patient]
## Grouped # Log of 5442 events consisting of:
## 7 traces
## 500 cases
## 2721 instances of 7 activities
## 7 resources
## Events occurred from 2017-01-02 11:41:53 until 2018-05-05 07:16:02
##
## # Variables were mapped as follows:
## Case identifier: patient
## Activity identifier: handling
## Resource identifier: employee
## Activity instance identifier: handling_id
## Timestamp: time
## Lifecycle transition: registration_type
##
## # A tibble: 5,442 × 7
## handling patient employee handling_id registration_type time
## <fct> <chr> <fct> <chr> <fct> <dttm>
## 1 Registrat… 1 r1 1 start 2017-01-02 11:41:53
## 2 Registrat… 2 r1 2 start 2017-01-02 11:41:53
## 3 Registrat… 3 r1 3 start 2017-01-04 01:34:05
## 4 Registrat… 4 r1 4 start 2017-01-04 01:34:04
## 5 Registrat… 5 r1 5 start 2017-01-04 16:07:47
## 6 Registrat… 6 r1 6 start 2017-01-04 16:07:47
## 7 Registrat… 7 r1 7 start 2017-01-05 04:56:11
## 8 Registrat… 8 r1 8 start 2017-01-05 04:56:11
## 9 Registrat… 9 r1 9 start 2017-01-06 05:58:54
## 10 Registrat… 10 r1 10 start 2017-01-06 05:58:54
## # ℹ 5,432 more rows
## # ℹ 1 more variable: .order <int>
While, except for the more common resource-activity, not all relevant
combinations of groupings are provided as a shortcut, the internal
group_by_ids()
allows the use of any combination of
_id()
functions. For example:
%>%
patients :::group_by_ids(activity_id, case_id) bupaR
## # Groups: [handling, patient]
## Grouped # Log of 5442 events consisting of:
## 7 traces
## 500 cases
## 2721 instances of 7 activities
## 7 resources
## Events occurred from 2017-01-02 11:41:53 until 2018-05-05 07:16:02
##
## # Variables were mapped as follows:
## Case identifier: patient
## Activity identifier: handling
## Resource identifier: employee
## Activity instance identifier: handling_id
## Timestamp: time
## Lifecycle transition: registration_type
##
## # A tibble: 5,442 × 7
## handling patient employee handling_id registration_type time
## <fct> <chr> <fct> <chr> <fct> <dttm>
## 1 Registrat… 1 r1 1 start 2017-01-02 11:41:53
## 2 Registrat… 2 r1 2 start 2017-01-02 11:41:53
## 3 Registrat… 3 r1 3 start 2017-01-04 01:34:05
## 4 Registrat… 4 r1 4 start 2017-01-04 01:34:04
## 5 Registrat… 5 r1 5 start 2017-01-04 16:07:47
## 6 Registrat… 6 r1 6 start 2017-01-04 16:07:47
## 7 Registrat… 7 r1 7 start 2017-01-05 04:56:11
## 8 Registrat… 8 r1 8 start 2017-01-05 04:56:11
## 9 Registrat… 9 r1 9 start 2017-01-06 05:58:54
## 10 Registrat… 10 r1 10 start 2017-01-06 05:58:54
## # ℹ 5,432 more rows
## # ℹ 1 more variable: .order <int>
Note that the notation is analogous to select_ids()
:
specify the id functions, without quotation marks or brackets.
Copyright © 2023 bupaR - Hasselt University