Adjusting logs

The mapping of a log is defined by the different variables which are mapped onto the specific characteristics.

For an eventlog:

  • case identifier (case_id)
  • activity type (activity_id)
  • activity instance (activity_instance_id)
  • transaction status (lifecycle_id)
  • timestamp (timestamp)
  • resource (resource)

For an activitylog:

  • case identifier (case_id)
  • activity type (activity_id)
  • timestamps (timestamps)
  • resource (resource)

More information on these characteristics can be found here. Each of these can be modified to approach event logs from a different angle. This can be done using the eventlog() or activitylog(), auxiliary set_-functions, or by using an existing mapping.

library(bupaR)

Using eventlog()/activitylog()

The eventlog() and activitylog() functions are not only used to instantiate a log object, but can also be used to modify it, by using a log object as input and setting only the identifiers one wants to change.

For example, consider the traffic_fines data. We could change case_id argument to the vehicleclass column as follows (This is a purely hypothetical example). You will see that the number of cases has changed after this modification.

traffic_fines %>%
    eventlog(case_id = "vehicleclass")
## # Log of 34724 events consisting of:
## 4 traces 
## 4 cases 
## 34724 instances of 11 activities 
## 16 resources 
## Events occurred from 2006-06-17 until 2012-03-26 
##  
## # Variables were mapped as follows:
## Case identifier:     vehicleclass 
## Activity identifier:     activity 
## Resource identifier:     resource 
## Activity instance identifier:    activity_instance_id 
## Timestamp:           timestamp 
## Lifecycle transition:        lifecycle 
## 
## # A tibble: 34,724 × 18
##    case_id activity   lifec…¹ resou…² timestamp           amount article dismi…³
##    <chr>   <fct>      <fct>   <fct>   <dttm>              <chr>    <dbl> <chr>  
##  1 A1      Create Fi… comple… 561     2006-07-24 00:00:00 35.0       157 NIL    
##  2 A1      Send Fine  comple… <NA>    2006-12-05 00:00:00 <NA>        NA <NA>   
##  3 A100    Create Fi… comple… 561     2006-08-02 00:00:00 35.0       157 NIL    
##  4 A100    Send Fine  comple… <NA>    2006-12-12 00:00:00 <NA>        NA <NA>   
##  5 A100    Insert Fi… comple… <NA>    2007-01-15 00:00:00 <NA>        NA <NA>   
##  6 A100    Add penal… comple… <NA>    2007-03-16 00:00:00 71.5        NA <NA>   
##  7 A100    Send for … comple… <NA>    2009-03-30 00:00:00 <NA>        NA <NA>   
##  8 A10000  Create Fi… comple… 561     2007-03-09 00:00:00 36.0       157 NIL    
##  9 A10000  Send Fine  comple… <NA>    2007-07-17 00:00:00 <NA>        NA <NA>   
## 10 A10000  Insert Fi… comple… <NA>    2007-08-02 00:00:00 <NA>        NA <NA>   
## # … with 34,714 more rows, 10 more variables: expense <chr>, lastsent <chr>,
## #   matricola <dbl>, notificationtype <chr>, paymentamount <dbl>, points <dbl>,
## #   totalpaymentamount <chr>, vehicleclass <chr>, activity_instance_id <chr>,
## #   .order <int>, and abbreviated variable names ¹​lifecycle, ²​resource,
## #   ³​dismissal

Using set_-functions

If we only want to change one of the elements, as in the example above, set() provides a very convenient way to do so. The same change as before can be done as follows:

traffic_fines %>%
    set_case_id("vehicleclass")
## # Log of 34724 events consisting of:
## 4 traces 
## 4 cases 
## 34724 instances of 11 activities 
## 16 resources 
## Events occurred from 2006-06-17 until 2012-03-26 
##  
## # Variables were mapped as follows:
## Case identifier:     vehicleclass 
## Activity identifier:     activity 
## Resource identifier:     resource 
## Activity instance identifier:    activity_instance_id 
## Timestamp:           timestamp 
## Lifecycle transition:        lifecycle 
## 
## # A tibble: 34,724 × 18
##    case_id activity   lifec…¹ resou…² timestamp           amount article dismi…³
##    <chr>   <fct>      <fct>   <fct>   <dttm>              <chr>    <dbl> <chr>  
##  1 A1      Create Fi… comple… 561     2006-07-24 00:00:00 35.0       157 NIL    
##  2 A1      Send Fine  comple… <NA>    2006-12-05 00:00:00 <NA>        NA <NA>   
##  3 A100    Create Fi… comple… 561     2006-08-02 00:00:00 35.0       157 NIL    
##  4 A100    Send Fine  comple… <NA>    2006-12-12 00:00:00 <NA>        NA <NA>   
##  5 A100    Insert Fi… comple… <NA>    2007-01-15 00:00:00 <NA>        NA <NA>   
##  6 A100    Add penal… comple… <NA>    2007-03-16 00:00:00 71.5        NA <NA>   
##  7 A100    Send for … comple… <NA>    2009-03-30 00:00:00 <NA>        NA <NA>   
##  8 A10000  Create Fi… comple… 561     2007-03-09 00:00:00 36.0       157 NIL    
##  9 A10000  Send Fine  comple… <NA>    2007-07-17 00:00:00 <NA>        NA <NA>   
## 10 A10000  Insert Fi… comple… <NA>    2007-08-02 00:00:00 <NA>        NA <NA>   
## # … with 34,714 more rows, 10 more variables: expense <chr>, lastsent <chr>,
## #   matricola <dbl>, notificationtype <chr>, paymentamount <dbl>, points <dbl>,
## #   totalpaymentamount <chr>, vehicleclass <chr>, activity_instance_id <chr>,
## #   .order <int>, and abbreviated variable names ¹​lifecycle, ²​resource,
## #   ³​dismissal

Using existing mapping

It is also possible to extract the log mapping at a certain point of time using mapping().

mapping_fines <- mapping(traffic_fines)
mapping_fines
## Case identifier:     case_id 
## Activity identifier:     activity 
## Resource identifier:     resource 
## Activity instance identifier:    activity_instance_id 
## Timestamp:           timestamp 
## Lifecycle transition:        lifecycle

We can adjust the mapping incrementally by using the described approaches above.

traffic_fines %>%
    set_case_id("vehicleclass") %>%
    set_activity_id("notificationtype") -> traffic_fines

Later, we can always undo these changes and “reset” the original mapping using re_map().

traffic_fines %>%
    re_map(mapping_fines)
## # Log of 34724 events consisting of:
## 44 traces 
## 10000 cases 
## 34724 instances of 11 activities 
## 16 resources 
## Events occurred from 2006-06-17 until 2012-03-26 
##  
## # Variables were mapped as follows:
## Case identifier:     case_id 
## Activity identifier:     activity 
## Resource identifier:     resource 
## Activity instance identifier:    activity_instance_id 
## Timestamp:           timestamp 
## Lifecycle transition:        lifecycle 
## 
## # A tibble: 34,724 × 18
##    case_id activity   lifec…¹ resou…² timestamp           amount article dismi…³
##    <chr>   <fct>      <fct>   <fct>   <dttm>              <chr>    <dbl> <chr>  
##  1 A1      Create Fi… comple… 561     2006-07-24 00:00:00 35.0       157 NIL    
##  2 A1      Send Fine  comple… <NA>    2006-12-05 00:00:00 <NA>        NA <NA>   
##  3 A100    Create Fi… comple… 561     2006-08-02 00:00:00 35.0       157 NIL    
##  4 A100    Send Fine  comple… <NA>    2006-12-12 00:00:00 <NA>        NA <NA>   
##  5 A100    Insert Fi… comple… <NA>    2007-01-15 00:00:00 <NA>        NA <NA>   
##  6 A100    Add penal… comple… <NA>    2007-03-16 00:00:00 71.5        NA <NA>   
##  7 A100    Send for … comple… <NA>    2009-03-30 00:00:00 <NA>        NA <NA>   
##  8 A10000  Create Fi… comple… 561     2007-03-09 00:00:00 36.0       157 NIL    
##  9 A10000  Send Fine  comple… <NA>    2007-07-17 00:00:00 <NA>        NA <NA>   
## 10 A10000  Insert Fi… comple… <NA>    2007-08-02 00:00:00 <NA>        NA <NA>   
## # … with 34,714 more rows, 10 more variables: expense <chr>, lastsent <chr>,
## #   matricola <dbl>, notificationtype <chr>, paymentamount <dbl>, points <dbl>,
## #   totalpaymentamount <chr>, vehicleclass <chr>, activity_instance_id <chr>,
## #   .order <int>, and abbreviated variable names ¹​lifecycle, ²​resource,
## #   ³​dismissal

Read more:


Copyright © 2023 bupaR - Hasselt University