library(bupaR)

filter

Generic filtering of events can be done using the filter function, which takes an event log and any number of logical conditions. The example below filters events which have vehicleclas “C” and amount greater than 300. More process-specific filtering methods can be found here.

traffic_fines %>%
    filter(vehicleclass == "C", amount > 300)
## # Log of 20 events consisting of:
## 1 trace 
## 20 cases 
## 20 instances of 1 activity 
## 10 resources 
## Events occurred from 2006-08-10 until 2008-02-09 
##  
## # Variables were mapped as follows:
## Case identifier:     case_id 
## Activity identifier:     activity 
## Resource identifier:     resource 
## Activity instance identifier:    activity_instance_id 
## Timestamp:           timestamp 
## Lifecycle transition:        lifecycle 
## 
## # A tibble: 20 × 18
##    case_id activity   lifec…¹ resou…² timestamp           amount article dismi…³
##    <chr>   <fct>      <fct>   <fct>   <dttm>              <chr>    <dbl> <chr>  
##  1 A10060  Create Fi… comple… 541     2007-03-08 00:00:00 36.0       157 NIL    
##  2 A10497  Create Fi… comple… 558     2007-03-30 00:00:00 36.0       157 NIL    
##  3 A10818  Create Fi… comple… 561     2007-04-08 00:00:00 36.0       157 NIL    
##  4 A11707  Create Fi… comple… 550     2007-04-24 00:00:00 36.0       157 NIL    
##  5 A11936  Create Fi… comple… 557     2007-04-29 00:00:00 36.0       157 NIL    
##  6 A12073  Create Fi… comple… 557     2007-05-03 00:00:00 36.0       157 NIL    
##  7 A1408   Create Fi… comple… 559     2006-08-20 00:00:00 35.0       157 NIL    
##  8 A14883  Create Fi… comple… 561     2007-06-29 00:00:00 36.0       157 NIL    
##  9 A17130  Create Fi… comple… 541     2007-07-15 00:00:00 36.0       157 NIL    
## 10 A1815   Create Fi… comple… 563     2006-08-10 00:00:00 35.0       157 NIL    
## 11 A19109  Create Fi… comple… 556     2007-07-17 00:00:00 36.0       157 NIL    
## 12 A23000  Create Fi… comple… 550     2007-12-29 00:00:00 36.0       157 NIL    
## 13 A24247  Create Fi… comple… 561     2007-12-03 00:00:00 36.0       157 NIL    
## 14 A24366  Create Fi… comple… 541     2008-02-09 00:00:00 36.0       157 NIL    
## 15 A24634  Create Fi… comple… 537     2007-11-21 00:00:00 36.0       157 NIL    
## 16 A24942  Create Fi… comple… 561     2007-12-30 00:00:00 36.0       157 NIL    
## 17 A25581  Create Fi… comple… 559     2007-11-23 00:00:00 36.0       157 NIL    
## 18 A25599  Create Fi… comple… 559     2007-11-24 00:00:00 36.0       157 NIL    
## 19 A26099  Create Fi… comple… 559     2007-12-09 00:00:00 36.0       157 NIL    
## 20 A26277  Create Fi… comple… 538     2008-01-07 00:00:00 36.0       157 NIL    
## # … with 10 more variables: expense <chr>, lastsent <chr>, matricola <dbl>,
## #   notificationtype <chr>, paymentamount <dbl>, points <dbl>,
## #   totalpaymentamount <chr>, vehicleclass <chr>, activity_instance_id <chr>,
## #   .order <int>, and abbreviated variable names ¹​lifecycle, ²​resource,
## #   ³​dismissal

slice

An eventlog can be sliced, which mean returning a slice, i.e. a subset, from the eventlog, based on row number. There are three ways to slice event logs

  • Using slice: take a slice of cases
  • Using slice_activities: take a slice of activity instances
  • Using slice_events: take a slice of events

The next piece of code returns the first 10 cases. Note that first here is defined by the current order of the data set, not by time.

patients %>%
    slice(1:10)
## # Log of 110 events consisting of:
## 2 traces 
## 10 cases 
## 55 instances of 7 activities 
## 7 resources 
## Events occurred from 2017-01-02 11:41:53 until 2017-01-11 11:39:30 
##  
## # Variables were mapped as follows:
## Case identifier:     patient 
## Activity identifier:     handling 
## Resource identifier:     employee 
## Activity instance identifier:    handling_id 
## Timestamp:           time 
## Lifecycle transition:        registration_type 
## 
## # A tibble: 110 × 7
##    handling     patient employee handling_id regist…¹ time                .order
##    <fct>        <chr>   <fct>    <chr>       <fct>    <dttm>               <int>
##  1 Registration 1       r1       1           start    2017-01-02 11:41:53      1
##  2 Registration 2       r1       2           start    2017-01-02 11:41:53      2
##  3 Registration 3       r1       3           start    2017-01-04 01:34:05      3
##  4 Registration 4       r1       4           start    2017-01-04 01:34:04      4
##  5 Registration 5       r1       5           start    2017-01-04 16:07:47      5
##  6 Registration 6       r1       6           start    2017-01-04 16:07:47      6
##  7 Registration 7       r1       7           start    2017-01-05 04:56:11      7
##  8 Registration 8       r1       8           start    2017-01-05 04:56:11      8
##  9 Registration 9       r1       9           start    2017-01-06 05:58:54      9
## 10 Registration 10      r1       10          start    2017-01-06 05:58:54     10
## # … with 100 more rows, and abbreviated variable name ¹​registration_type

slice_activities

The next piece of code returns the first 10 activity instances.

patients %>%
    slice_activities(1:10)
## # Log of 20 events consisting of:
## 1 trace 
## 10 cases 
## 10 instances of 1 activity 
## 1 resource 
## Events occurred from 2017-01-02 11:41:53 until 2017-01-06 09:13:28 
##  
## # Variables were mapped as follows:
## Case identifier:     patient 
## Activity identifier:     handling 
## Resource identifier:     employee 
## Activity instance identifier:    handling_id 
## Timestamp:           time 
## Lifecycle transition:        registration_type 
## 
## # A tibble: 20 × 7
##    handling     patient employee handling_id regist…¹ time                .order
##    <fct>        <chr>   <fct>    <chr>       <fct>    <dttm>               <int>
##  1 Registration 1       r1       1           start    2017-01-02 11:41:53      1
##  2 Registration 2       r1       2           start    2017-01-02 11:41:53      2
##  3 Registration 3       r1       3           start    2017-01-04 01:34:05      3
##  4 Registration 4       r1       4           start    2017-01-04 01:34:04      4
##  5 Registration 5       r1       5           start    2017-01-04 16:07:47      5
##  6 Registration 6       r1       6           start    2017-01-04 16:07:47      6
##  7 Registration 7       r1       7           start    2017-01-05 04:56:11      7
##  8 Registration 8       r1       8           start    2017-01-05 04:56:11      8
##  9 Registration 9       r1       9           start    2017-01-06 05:58:54      9
## 10 Registration 10      r1       10          start    2017-01-06 05:58:54     10
## 11 Registration 1       r1       1           complete 2017-01-02 12:40:20     11
## 12 Registration 2       r1       2           complete 2017-01-02 15:16:38     12
## 13 Registration 3       r1       3           complete 2017-01-04 06:36:54     13
## 14 Registration 4       r1       4           complete 2017-01-04 04:25:06     14
## 15 Registration 5       r1       5           complete 2017-01-04 20:07:50     15
## 16 Registration 6       r1       6           complete 2017-01-04 18:12:46     16
## 17 Registration 7       r1       7           complete 2017-01-05 06:27:49     17
## 18 Registration 8       r1       8           complete 2017-01-05 07:58:17     18
## 19 Registration 9       r1       9           complete 2017-01-06 07:18:32     19
## 20 Registration 10      r1       10          complete 2017-01-06 09:13:28     20
## # … with abbreviated variable name ¹​registration_type

slice_events

The next piece of code returns the first 10 events.

patients %>% 
    slice_events(1:10)
## # Log of 10 events consisting of:
## 1 trace 
## 10 cases 
## 10 instances of 1 activity 
## 1 resource 
## Events occurred from 2017-01-02 11:41:53 until 2017-01-06 05:58:54 
##  
## # Variables were mapped as follows:
## Case identifier:     patient 
## Activity identifier:     handling 
## Resource identifier:     employee 
## Activity instance identifier:    handling_id 
## Timestamp:           time 
## Lifecycle transition:        registration_type 
## 
## # A tibble: 10 × 7
##    handling     patient employee handling_id regist…¹ time                .order
##    <fct>        <chr>   <fct>    <chr>       <fct>    <dttm>               <int>
##  1 Registration 1       r1       1           start    2017-01-02 11:41:53      1
##  2 Registration 2       r1       2           start    2017-01-02 11:41:53      2
##  3 Registration 3       r1       3           start    2017-01-04 01:34:05      3
##  4 Registration 4       r1       4           start    2017-01-04 01:34:04      4
##  5 Registration 5       r1       5           start    2017-01-04 16:07:47      5
##  6 Registration 6       r1       6           start    2017-01-04 16:07:47      6
##  7 Registration 7       r1       7           start    2017-01-05 04:56:11      7
##  8 Registration 8       r1       8           start    2017-01-05 04:56:11      8
##  9 Registration 9       r1       9           start    2017-01-06 05:58:54      9
## 10 Registration 10      r1       10          start    2017-01-06 05:58:54     10
## # … with abbreviated variable name ¹​registration_type

first_n, last_n

The slice function select events, cases or activity instances based on their current position in the event data. As such, the result can be changed using the arrange function. More often, we want to select the first n activity instances, or the last ones. This is achieved with the first_n or last_n functions, which return the first, resp. last, n activity instances of a log based on time, not on position.

patients %>% 
    first_n(n = 5)
## # Log of 10 events consisting of:
## 2 traces 
## 3 cases 
## 5 instances of 2 activities 
## 2 resources 
## Events occurred from 2017-01-02 11:41:53 until 2017-01-04 04:25:06 
##  
## # Variables were mapped as follows:
## Case identifier:     patient 
## Activity identifier:     handling 
## Resource identifier:     employee 
## Activity instance identifier:    handling_id 
## Timestamp:           time 
## Lifecycle transition:        registration_type 
## 
## # A tibble: 10 × 7
##    handling           patient emplo…¹ handl…² regis…³ time                .order
##    <fct>              <chr>   <fct>   <chr>   <fct>   <dttm>               <int>
##  1 Registration       1       r1      1       start   2017-01-02 11:41:53      1
##  2 Registration       2       r1      2       start   2017-01-02 11:41:53      2
##  3 Triage and Assess… 1       r2      501     start   2017-01-02 12:40:20      4
##  4 Registration       1       r1      1       comple… 2017-01-02 12:40:20      6
##  5 Registration       2       r1      2       comple… 2017-01-02 15:16:38      7
##  6 Triage and Assess… 2       r2      502     start   2017-01-02 22:32:25      5
##  7 Triage and Assess… 1       r2      501     comple… 2017-01-02 22:32:25      9
##  8 Triage and Assess… 2       r2      502     comple… 2017-01-03 12:34:01     10
##  9 Registration       4       r1      4       start   2017-01-04 01:34:04      3
## 10 Registration       4       r1      4       comple… 2017-01-04 04:25:06      8
## # … with abbreviated variable names ¹​employee, ²​handling_id, ³​registration_type

This is not impacted by a different ordering of the data since it will take the time aspect into account.

patients %>%
    arrange(desc(time)) %>%
    first_n(n = 5)
## # Log of 10 events consisting of:
## 2 traces 
## 3 cases 
## 5 instances of 2 activities 
## 2 resources 
## Events occurred from 2017-01-02 11:41:53 until 2017-01-04 04:25:06 
##  
## # Variables were mapped as follows:
## Case identifier:     patient 
## Activity identifier:     handling 
## Resource identifier:     employee 
## Activity instance identifier:    handling_id 
## Timestamp:           time 
## Lifecycle transition:        registration_type 
## 
## # A tibble: 10 × 7
##    handling           patient emplo…¹ handl…² regis…³ time                .order
##    <fct>              <chr>   <fct>   <chr>   <fct>   <dttm>               <int>
##  1 Registration       1       r1      1       start   2017-01-02 11:41:53      1
##  2 Registration       2       r1      2       start   2017-01-02 11:41:53      2
##  3 Triage and Assess… 1       r2      501     start   2017-01-02 12:40:20      4
##  4 Registration       1       r1      1       comple… 2017-01-02 12:40:20      6
##  5 Registration       2       r1      2       comple… 2017-01-02 15:16:38      7
##  6 Triage and Assess… 2       r2      502     start   2017-01-02 22:32:25      5
##  7 Triage and Assess… 1       r2      501     comple… 2017-01-02 22:32:25      9
##  8 Triage and Assess… 2       r2      502     comple… 2017-01-03 12:34:01     10
##  9 Registration       4       r1      4       start   2017-01-04 01:34:04      3
## 10 Registration       4       r1      4       comple… 2017-01-04 04:25:06      8
## # … with abbreviated variable names ¹​employee, ²​handling_id, ³​registration_type

Incombination with group_by_case, it is very easy to select the heads or tails of each case. Below, we explore the 95% most common first 3 activities in the sepsis log.

sepsis %>%
    group_by_case() %>%
    first_n(3) %>%
    trace_explorer(coverage = 0.95)

sample_n

The sample_n function allows to take a sample of the event log containing n cases. The code below returns a sample of 10 patients.

patients %>%
    sample_n(size = 10)
## # Log of 108 events consisting of:
## 3 traces 
## 10 cases 
## 54 instances of 7 activities 
## 7 resources 
## Events occurred from 2017-03-29 22:12:55 until 2018-05-04 21:50:07 
##  
## # Variables were mapped as follows:
## Case identifier:     patient 
## Activity identifier:     handling 
## Resource identifier:     employee 
## Activity instance identifier:    handling_id 
## Timestamp:           time 
## Lifecycle transition:        registration_type 
## 
## # A tibble: 108 × 7
##    handling     patient employee handling_id regist…¹ time                .order
##    <fct>        <chr>   <fct>    <chr>       <fct>    <dttm>               <int>
##  1 Registration 80      r1       80          start    2017-03-29 22:12:55      1
##  2 Registration 92      r1       92          start    2017-04-04 17:42:26      2
##  3 Registration 156     r1       156         start    2017-06-03 10:05:28      3
##  4 Registration 170     r1       170         start    2017-06-17 15:10:30      4
##  5 Registration 202     r1       202         start    2017-07-17 03:11:39      5
##  6 Registration 231     r1       231         start    2017-08-13 19:50:42      6
##  7 Registration 328     r1       328         start    2017-11-12 04:23:27      7
##  8 Registration 434     r1       434         start    2018-02-19 02:53:00      8
##  9 Registration 462     r1       462         start    2018-03-20 07:37:11      9
## 10 Registration 497     r1       497         start    2018-04-30 09:42:11     10
## # … with 98 more rows, and abbreviated variable name ¹​registration_type

Note that this function can also be used with a sample size bigger than the number of cases in the event log, if you allow for the replacements of drawn cases.

A more extensive list of subsetting methods is provided by edeaR. Look here for more information.


Read more:


Copyright © 2023 bupaR - Hasselt University