Generic filtering of events can be done using the filter
function, which takes an event log and any number of logical conditions.
The example below filters events which have vehicleclas “C” and amount
greater than 300. More process-specific filtering methods can be found
here.
## # Log of 20 events consisting of:
## 1 trace
## 20 cases
## 20 instances of 1 activity
## 10 resources
## Events occurred from 2006-08-10 until 2008-02-09
##
## # Variables were mapped as follows:
## Case identifier: case_id
## Activity identifier: activity
## Resource identifier: resource
## Activity instance identifier: activity_instance_id
## Timestamp: timestamp
## Lifecycle transition: lifecycle
##
## # A tibble: 20 × 18
## case_id activity lifecycle resource timestamp amount article
## <chr> <fct> <fct> <fct> <dttm> <chr> <dbl>
## 1 A10060 Create Fine complete 541 2007-03-08 00:00:00 36.0 157
## 2 A10497 Create Fine complete 558 2007-03-30 00:00:00 36.0 157
## 3 A10818 Create Fine complete 561 2007-04-08 00:00:00 36.0 157
## 4 A11707 Create Fine complete 550 2007-04-24 00:00:00 36.0 157
## 5 A11936 Create Fine complete 557 2007-04-29 00:00:00 36.0 157
## 6 A12073 Create Fine complete 557 2007-05-03 00:00:00 36.0 157
## 7 A1408 Create Fine complete 559 2006-08-20 00:00:00 35.0 157
## 8 A14883 Create Fine complete 561 2007-06-29 00:00:00 36.0 157
## 9 A17130 Create Fine complete 541 2007-07-15 00:00:00 36.0 157
## 10 A1815 Create Fine complete 563 2006-08-10 00:00:00 35.0 157
## 11 A19109 Create Fine complete 556 2007-07-17 00:00:00 36.0 157
## 12 A23000 Create Fine complete 550 2007-12-29 00:00:00 36.0 157
## 13 A24247 Create Fine complete 561 2007-12-03 00:00:00 36.0 157
## 14 A24366 Create Fine complete 541 2008-02-09 00:00:00 36.0 157
## 15 A24634 Create Fine complete 537 2007-11-21 00:00:00 36.0 157
## 16 A24942 Create Fine complete 561 2007-12-30 00:00:00 36.0 157
## 17 A25581 Create Fine complete 559 2007-11-23 00:00:00 36.0 157
## 18 A25599 Create Fine complete 559 2007-11-24 00:00:00 36.0 157
## 19 A26099 Create Fine complete 559 2007-12-09 00:00:00 36.0 157
## 20 A26277 Create Fine complete 538 2008-01-07 00:00:00 36.0 157
## # ℹ 11 more variables: dismissal <chr>, expense <chr>, lastsent <chr>,
## # matricola <dbl>, notificationtype <chr>, paymentamount <dbl>, points <dbl>,
## # totalpaymentamount <chr>, vehicleclass <chr>, activity_instance_id <chr>,
## # .order <int>
An eventlog can be sliced, which mean returning a slice, i.e. a subset, from the eventlog, based on row number. There are three ways to slice event logs
slice
: take a slice of casesslice_activities
: take a slice of activity
instancesslice_events
: take a slice of eventsThe next piece of code returns the first 10 cases. Note that first here is defined by the current order of the data set, not by time.
## # Log of 110 events consisting of:
## 2 traces
## 10 cases
## 55 instances of 7 activities
## 7 resources
## Events occurred from 2017-01-02 11:41:53 until 2017-01-11 11:39:30
##
## # Variables were mapped as follows:
## Case identifier: patient
## Activity identifier: handling
## Resource identifier: employee
## Activity instance identifier: handling_id
## Timestamp: time
## Lifecycle transition: registration_type
##
## # A tibble: 110 × 7
## handling patient employee handling_id registration_type time
## <fct> <chr> <fct> <chr> <fct> <dttm>
## 1 Registrat… 1 r1 1 start 2017-01-02 11:41:53
## 2 Registrat… 2 r1 2 start 2017-01-02 11:41:53
## 3 Registrat… 3 r1 3 start 2017-01-04 01:34:05
## 4 Registrat… 4 r1 4 start 2017-01-04 01:34:04
## 5 Registrat… 5 r1 5 start 2017-01-04 16:07:47
## 6 Registrat… 6 r1 6 start 2017-01-04 16:07:47
## 7 Registrat… 7 r1 7 start 2017-01-05 04:56:11
## 8 Registrat… 8 r1 8 start 2017-01-05 04:56:11
## 9 Registrat… 9 r1 9 start 2017-01-06 05:58:54
## 10 Registrat… 10 r1 10 start 2017-01-06 05:58:54
## # ℹ 100 more rows
## # ℹ 1 more variable: .order <int>
The next piece of code returns the first 10 activity instances.
## # Log of 20 events consisting of:
## 1 trace
## 10 cases
## 10 instances of 1 activity
## 1 resource
## Events occurred from 2017-01-02 11:41:53 until 2017-01-06 09:13:28
##
## # Variables were mapped as follows:
## Case identifier: patient
## Activity identifier: handling
## Resource identifier: employee
## Activity instance identifier: handling_id
## Timestamp: time
## Lifecycle transition: registration_type
##
## # A tibble: 20 × 7
## handling patient employee handling_id registration_type time
## <fct> <chr> <fct> <chr> <fct> <dttm>
## 1 Registrat… 1 r1 1 start 2017-01-02 11:41:53
## 2 Registrat… 2 r1 2 start 2017-01-02 11:41:53
## 3 Registrat… 3 r1 3 start 2017-01-04 01:34:05
## 4 Registrat… 4 r1 4 start 2017-01-04 01:34:04
## 5 Registrat… 5 r1 5 start 2017-01-04 16:07:47
## 6 Registrat… 6 r1 6 start 2017-01-04 16:07:47
## 7 Registrat… 7 r1 7 start 2017-01-05 04:56:11
## 8 Registrat… 8 r1 8 start 2017-01-05 04:56:11
## 9 Registrat… 9 r1 9 start 2017-01-06 05:58:54
## 10 Registrat… 10 r1 10 start 2017-01-06 05:58:54
## 11 Registrat… 1 r1 1 complete 2017-01-02 12:40:20
## 12 Registrat… 2 r1 2 complete 2017-01-02 15:16:38
## 13 Registrat… 3 r1 3 complete 2017-01-04 06:36:54
## 14 Registrat… 4 r1 4 complete 2017-01-04 04:25:06
## 15 Registrat… 5 r1 5 complete 2017-01-04 20:07:50
## 16 Registrat… 6 r1 6 complete 2017-01-04 18:12:46
## 17 Registrat… 7 r1 7 complete 2017-01-05 06:27:49
## 18 Registrat… 8 r1 8 complete 2017-01-05 07:58:17
## 19 Registrat… 9 r1 9 complete 2017-01-06 07:18:32
## 20 Registrat… 10 r1 10 complete 2017-01-06 09:13:28
## # ℹ 1 more variable: .order <int>
The next piece of code returns the first 10 events.
## # Log of 10 events consisting of:
## 1 trace
## 10 cases
## 10 instances of 1 activity
## 1 resource
## Events occurred from 2017-01-02 11:41:53 until 2017-01-06 05:58:54
##
## # Variables were mapped as follows:
## Case identifier: patient
## Activity identifier: handling
## Resource identifier: employee
## Activity instance identifier: handling_id
## Timestamp: time
## Lifecycle transition: registration_type
##
## # A tibble: 10 × 7
## handling patient employee handling_id registration_type time
## <fct> <chr> <fct> <chr> <fct> <dttm>
## 1 Registrat… 1 r1 1 start 2017-01-02 11:41:53
## 2 Registrat… 2 r1 2 start 2017-01-02 11:41:53
## 3 Registrat… 3 r1 3 start 2017-01-04 01:34:05
## 4 Registrat… 4 r1 4 start 2017-01-04 01:34:04
## 5 Registrat… 5 r1 5 start 2017-01-04 16:07:47
## 6 Registrat… 6 r1 6 start 2017-01-04 16:07:47
## 7 Registrat… 7 r1 7 start 2017-01-05 04:56:11
## 8 Registrat… 8 r1 8 start 2017-01-05 04:56:11
## 9 Registrat… 9 r1 9 start 2017-01-06 05:58:54
## 10 Registrat… 10 r1 10 start 2017-01-06 05:58:54
## # ℹ 1 more variable: .order <int>
The slice function select events, cases or activity instances based
on their current position in the event data. As such, the result can be
changed using the arrange
function. More often, we want to
select the first n activity instances, or the last ones. This
is achieved with the first_n
or last_n
functions, which return the first, resp. last, n activity instances of a
log based on time, not on position.
## # Log of 10 events consisting of:
## 2 traces
## 3 cases
## 5 instances of 2 activities
## 2 resources
## Events occurred from 2017-01-02 11:41:53 until 2017-01-04 04:25:06
##
## # Variables were mapped as follows:
## Case identifier: patient
## Activity identifier: handling
## Resource identifier: employee
## Activity instance identifier: handling_id
## Timestamp: time
## Lifecycle transition: registration_type
##
## # A tibble: 10 × 7
## handling patient employee handling_id registration_type time
## <fct> <chr> <fct> <chr> <fct> <dttm>
## 1 Registrat… 1 r1 1 start 2017-01-02 11:41:53
## 2 Registrat… 2 r1 2 start 2017-01-02 11:41:53
## 3 Triage an… 1 r2 501 start 2017-01-02 12:40:20
## 4 Registrat… 1 r1 1 complete 2017-01-02 12:40:20
## 5 Registrat… 2 r1 2 complete 2017-01-02 15:16:38
## 6 Triage an… 2 r2 502 start 2017-01-02 22:32:25
## 7 Triage an… 1 r2 501 complete 2017-01-02 22:32:25
## 8 Triage an… 2 r2 502 complete 2017-01-03 12:34:01
## 9 Registrat… 4 r1 4 start 2017-01-04 01:34:04
## 10 Registrat… 4 r1 4 complete 2017-01-04 04:25:06
## # ℹ 1 more variable: .order <int>
This is not impacted by a different ordering of the data since it will take the time aspect into account.
## # Log of 10 events consisting of:
## 2 traces
## 3 cases
## 5 instances of 2 activities
## 2 resources
## Events occurred from 2017-01-02 11:41:53 until 2017-01-04 04:25:06
##
## # Variables were mapped as follows:
## Case identifier: patient
## Activity identifier: handling
## Resource identifier: employee
## Activity instance identifier: handling_id
## Timestamp: time
## Lifecycle transition: registration_type
##
## # A tibble: 10 × 7
## handling patient employee handling_id registration_type time
## <fct> <chr> <fct> <chr> <fct> <dttm>
## 1 Registrat… 1 r1 1 start 2017-01-02 11:41:53
## 2 Registrat… 2 r1 2 start 2017-01-02 11:41:53
## 3 Triage an… 1 r2 501 start 2017-01-02 12:40:20
## 4 Registrat… 1 r1 1 complete 2017-01-02 12:40:20
## 5 Registrat… 2 r1 2 complete 2017-01-02 15:16:38
## 6 Triage an… 2 r2 502 start 2017-01-02 22:32:25
## 7 Triage an… 1 r2 501 complete 2017-01-02 22:32:25
## 8 Triage an… 2 r2 502 complete 2017-01-03 12:34:01
## 9 Registrat… 4 r1 4 start 2017-01-04 01:34:04
## 10 Registrat… 4 r1 4 complete 2017-01-04 04:25:06
## # ℹ 1 more variable: .order <int>
Incombination with group_by_case
, it is very easy to
select the heads or tails of each case. Below, we explore the 95% most
common first 3 activities in the sepsis
log.
The sample_n
function allows to take a sample of the
event log containing n cases. The code below returns a sample of 10
patients.
## # Log of 108 events consisting of:
## 2 traces
## 10 cases
## 54 instances of 7 activities
## 7 resources
## Events occurred from 2017-03-17 05:57:32 until 2018-03-18 10:48:34
##
## # Variables were mapped as follows:
## Case identifier: patient
## Activity identifier: handling
## Resource identifier: employee
## Activity instance identifier: handling_id
## Timestamp: time
## Lifecycle transition: registration_type
##
## # A tibble: 108 × 7
## handling patient employee handling_id registration_type time
## <fct> <chr> <fct> <chr> <fct> <dttm>
## 1 Registrat… 70 r1 70 start 2017-03-17 05:57:32
## 2 Registrat… 78 r1 78 start 2017-03-25 18:07:50
## 3 Registrat… 127 r1 127 start 2017-05-05 23:40:40
## 4 Registrat… 155 r1 155 start 2017-06-03 10:05:28
## 5 Registrat… 261 r1 261 start 2017-09-16 15:08:03
## 6 Registrat… 319 r1 319 start 2017-11-04 06:23:34
## 7 Registrat… 325 r1 325 start 2017-11-09 03:47:37
## 8 Registrat… 350 r1 350 start 2017-11-24 12:22:18
## 9 Registrat… 362 r1 362 start 2017-12-05 18:53:44
## 10 Registrat… 455 r1 455 start 2018-03-14 21:04:28
## # ℹ 98 more rows
## # ℹ 1 more variable: .order <int>
Note that this function can also be used with a sample size bigger than the number of cases in the event log, if you allow for the replacements of drawn cases.
A more extensive list of subsetting methods is provided by edeaR. Look here for more information.
Read more:
Copyright © 2023 bupaR - Hasselt University