Skip to contents

Filters the log based on frequency of resources

Usage

filter_resource_frequency(
  log,
  interval = NULL,
  percentage = NULL,
  reverse = FALSE,
  eventlog = deprecated()
)

# S3 method for log
filter_resource_frequency(
  log,
  interval = NULL,
  percentage = NULL,
  reverse = FALSE,
  eventlog = deprecated()
)

# S3 method for grouped_log
filter_resource_frequency(
  log,
  interval = NULL,
  percentage = NULL,
  reverse = FALSE,
  eventlog = deprecated()
)

Arguments

log

log: Object of class log or derivatives (grouped_log, eventlog, activitylog, etc.).

percentage, interval

The target coverage of activity instances. Provide either percentage or interval.
percentage (numeric): A percentile of p will return the most common resource types of the log, which account for at least p% of the activity instances.
interval (numeric vector of length 2): A resource frequency interval. Half open interval can be created using NA.
For more information, see 'Details' below.

reverse

logical (default FALSE): Indicating whether the selection should be reversed.

eventlog

[Deprecated]; please use log instead.

Value

When given an object of type log, it will return a filtered log. When given an object of type grouped_log, the filter will be applied in a stratified way (i.e. each separately for each group). The returned log will be grouped on the same variables as the original log.

Details

Filtering the log based on resource frequency can be done in two ways: using an interval of allowed frequencies, or specify a coverage percentage:

  • percentage: When filtering using a percentage p%, the filter will return p% of the activity instances, starting from the resource labels with the highest frequency. The filter will retain additional resource labels as long as the number of activity instances does not exceed the percentage threshold.

  • interval: When filtering using an interval, resource labels will be retained when their absolute frequency fall in this interval. The interval is specified using a numeric vector of length 2. Half open intervals can be created by using NA, e.g., c(10, NA) will select resource labels which occur 10 times or more.

Methods (by class)

  • filter_resource_frequency(log): Filters resources for a log.

  • filter_resource_frequency(grouped_log): Filters resources for a grouped_log.

References

Swennen, M. (2018). Using Event Log Knowledge to Support Operational Exellence Techniques (Doctoral dissertation). Hasselt University.