library(bupaR)
The filter activity function can be used to filter activities by name. It has three arguments
%>%
patients filter_activity(c("X-Ray", "Blood test")) %>%
activities
## # A tibble: 2 × 3
## handling absolute_frequency relative_frequency
## <fct> <int> <dbl>
## 1 X-Ray 261 0.524
## 2 Blood test 237 0.476
As one can see, there are only 2 distinct activities left in the event log.
It is also possible to filter on activity frequency. This filter uses a percentile cut off, and will look at those activities which are most frequent until the required percentage of events has been reached. Thus, a percentile cut off of 80% will look at the activities needed to represent 80% of the events. In the example below, the least frequent activities covering 50% of the event log are selected, since the reverse argument is true.
%>%
patients filter_activity_frequency(percentage = 0.5, reverse = T) %>%
activities
## # A tibble: 4 × 3
## handling absolute_frequency relative_frequency
## <fct> <int> <dbl>
## 1 Check-out 492 0.401
## 2 X-Ray 261 0.213
## 3 Blood test 237 0.193
## 4 MRI SCAN 236 0.192
Instead of providing a target percentage, we can provide a target frequency interval. For example, only retain the activities which occur more than 300 times.
%>%
patients filter_activity_frequency(interval = c(300,500)) %>%
activities
## # A tibble: 4 × 3
## handling absolute_frequency relative_frequency
## <fct> <int> <dbl>
## 1 Registration 500 0.252
## 2 Triage and Assessment 500 0.252
## 3 Discuss Results 495 0.249
## 4 Check-out 492 0.248
When we don’t now the maximal frequency - 500 in this case, we can use an open interval by using NA.
%>%
patients filter_activity_frequency(interval = c(300, NA)) %>%
activities
## # A tibble: 4 × 3
## handling absolute_frequency relative_frequency
## <fct> <int> <dbl>
## 1 Registration 500 0.252
## 2 Triage and Assessment 500 0.252
## 3 Discuss Results 495 0.249
## 4 Check-out 492 0.248
Similar to the activity filter, the resource filter can be used to filter events by listing on or more resources.
%>%
patients filter_resource(c("r1","r4")) %>%
resource_frequency("resource")
## # A tibble: 2 × 3
## employee absolute relative
## <fct> <int> <dbl>
## 1 r1 500 0.679
## 2 r4 236 0.321
Instead of filtering events by the resource that performed the activity, we can also filter event by the frequency of the resource. This happens in the same way as for the activity frequency filter. The filter below gives us the 80% activity instances performed by the most common resources.
%>%
patients filter_resource_frequency(perc = 0.80) %>%
resources()
## # A tibble: 5 × 3
## employee absolute_frequency relative_frequency
## <fct> <int> <dbl>
## 1 r1 500 0.222
## 2 r2 500 0.222
## 3 r6 495 0.220
## 4 r7 492 0.219
## 5 r5 261 0.116
Alternatively, using the interval argument, we can select resources who perform between 200 and 300 activity instances.
%>%
patients filter_resource_frequency(interval = c(200,300)) %>%
resources()
## # A tibble: 3 × 3
## employee absolute_frequency relative_frequency
## <fct> <int> <dbl>
## 1 r5 261 0.356
## 2 r3 237 0.323
## 3 r4 236 0.322
The trim filter is a special event filter, as it also take into account the notion of cases. In fact, it trim cases such that they start with a certain activities until they end with a certain activity. It requires two list: one for possible start activities and one for end activities. The cases will be trimmed from the first appearance of a start activity till the last appearance of an end activity. When reversed, these slices of the event log will be removed instead of preserved.
%>%
patients filter_trim(start_activities = "Registration", end_activities = c("MRI SCAN","X-Ray")) %>%
process_map(type = performance())
Instead of triming cases to a particular start and/or end activity,
we can also trim cases to a particular time window. For this we use the
function filter_time_period
with filter_method
trim
. This filter needs a time interval, which is a vector
of length 2 containing data/datetime values. These can be created easily
using lubridate function,
e.g. ymd
for year-month-day formats.
This example takes only activity instances which happened (at least partly, i.e. some events) in December of 2017.
library(lubridate)
%>%
patients filter_time_period(interval = ymd(c(20171201, 20171231)), filter_method = "trim") %>%
summary()
## Number of events: 290
## Number of cases: 36
## Number of traces: 13
## Number of distinct activities: 7
## Average trace length: 8.055556
##
## Start eventlog: 2017-11-30 20:29:12
## End eventlog: 2017-12-31 08:00:08
## handling patient employee handling_id
## Blood test :30 Length:290 r1:52 Length:290
## Check-out :48 Class :character r2:52 Class :character
## Discuss Results :54 Mode :character r3:30 Mode :character
## MRI SCAN :30 r4:30
## Registration :52 r5:24
## Triage and Assessment:52 r6:54
## X-Ray :24 r7:48
## registration_type time .order
## complete:145 Min. :2017-11-30 20:29:12.00 Min. : 1.00
## start :145 1st Qu.:2017-12-06 01:04:43.25 1st Qu.: 73.25
## Median :2017-12-13 13:12:47.00 Median :145.50
## Mean :2017-12-13 20:14:51.00 Mean :145.50
## 3rd Qu.:2017-12-19 18:09:13.50 3rd Qu.:217.75
## Max. :2017-12-31 08:00:08.00 Max. :290.00
##
Using a different filter method (start, complete, contained or intersecting), this filter can also act as a case filter (see below).
Read more:
Copyright © 2023 bupaR - Hasselt University