read_xes()
can be used to read XES files and turn the
data into an eventlog
object in R. The function needs only
one xesfile
argument. This can be a local path to a file
with a .xes extension or an URL. An example XES file can be found at the
following link: https://bupar.net/eventdata/exercise1.xes.
When opening this file in a browser, you will see that it is an XML
file. More information on the notation can be found here.
Importing a XES file is easily done as follows:
library(xesreadR)
<- read_xes("https://bupar.net/eventdata/exercise1.xes") data
## Warning: `data_frame()` was deprecated in tibble 1.1.0.
## ℹ Please use `tibble()` instead.
## ℹ The deprecated feature was likely used in the xesreadR package.
## Please report the issue to the authors.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: `as_data_frame()` was deprecated in tibble 2.0.0.
## ℹ Please use `as_tibble()` (with slightly different semantics) to convert to a
## tibble, or `as.data.frame()` to convert to a data frame.
## ℹ The deprecated feature was likely used in the xesreadR package.
## Please report the issue to the authors.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning in read_xes("https://bupar.net/eventdata/exercise1.xes"): No activity
## instance identifier specified in xes-file. By default considered each event as
## a different activity instance. Please check!
data
## # Log of 11 events consisting of:
## 3 traces
## 3 cases
## 11 instances of 5 activities
## 1 resource
## Events occurred from 2008-12-09 07:20:01 until 2008-12-09 07:23:01
##
## # Variables were mapped as follows:
## Case identifier: CASE_concept_name
## Activity identifier: activity_id
## Resource identifier: resource_id
## Activity instance identifier: activity_instance_id
## Timestamp: timestamp
## Lifecycle transition: lifecycle_id
##
## # A tibble: 11 × 7
## CASE_concept_name activity_id lifecycle_id resource_id timestamp
## <chr> <chr> <chr> <chr> <dttm>
## 1 Case3.0 A complete UNDEFINED 2008-12-09 07:20:01
## 2 Case3.0 E complete UNDEFINED 2008-12-09 07:21:01
## 3 Case3.0 D complete UNDEFINED 2008-12-09 07:22:01
## 4 Case2.0 A complete UNDEFINED 2008-12-09 07:20:01
## 5 Case2.0 C complete UNDEFINED 2008-12-09 07:21:01
## 6 Case2.0 B complete UNDEFINED 2008-12-09 07:22:01
## 7 Case2.0 D complete UNDEFINED 2008-12-09 07:23:01
## 8 Case1.0 A complete UNDEFINED 2008-12-09 07:20:01
## 9 Case1.0 B complete UNDEFINED 2008-12-09 07:21:01
## 10 Case1.0 C complete UNDEFINED 2008-12-09 07:22:01
## 11 Case1.0 D complete UNDEFINED 2008-12-09 07:23:01
## # ℹ 2 more variables: activity_instance_id <int>, .order <int>
Note that in the example above, read_xes()
emits a
warnings that no activity instance identifier was found. Recall that an
eventlog
object in R needs certain data fields to be present. However, it might
be so that not all of these field are available, in which case
read_xes()
will throw a warning or an error. Ideally, the
XES file should contain at least the following elements:
trace>
<string key="concept:name" value="Case3.0"/>
<event>
<string key="concept:name" value="A"/>
<int key="concept:instance" value = "1"/>
<string key="org:resource" value="UNDEFINED"/>
<date key="time:timestamp" value="2008-12-09T08:20:01.527+01:00"/>
<string key="lifecycle:transition" value="complete"/>
<
...event>
</
...trace> </
These elements are translated according to the terminology used in
bupaR
as follows:
XES | bupaR | |
---|---|---|
trace | concept:name | case_id |
event | concept:name | activity_id |
concept:instance | activity_instance_id | |
org:resource | resource_id | |
time:timestamp | timestamp | |
lifecycle:transition | lifecycle_id |
When there is no case identifier, an artificial case identifier CASE_ID will be created based on the hierarchy of the XES file. In case of other missing elements, either an error or a warning will be thrown.
An error will be thrown if a XES file does not contain an
activity identifier or a timestamp. As
such these are the minimum requirements to create an
eventlog
object from a XES file.
In case the lifecycle transition identifier or the resource identifier is missing, an empty placeholder variable will be created and a warning will be emitted.
In case the activity instance identifier is missing, a default activity instance identifier column will be added. This column will regard every event in the log as a distinct activity instance. A warning will be emitted noting that you should check whether this is a justified assumption.
If available, missing information can be added manually to the
eventlog
object by overwriting the variables, e.g. with
mutate()
.
Note that both traces and events can have additional elements in the XES files. These will be added as extra variables in the resulting log. Attributes at a the level of traces will get the prefix CASE_ in their name. 1
In certain circumstances, it might be useful to have a separate list
of cases with case attributes. This can be obtained using
read_xes_cases()
. The argument for this function remains
the same xesfile.
The result is a data.frame
with one row for each case and one column for each attribute.
Non-existing attributes for a specific case are filled in with NA`s.
Below, this function is illustrated using the repairExample
event log, which has one case attribute called description. For
the sake of illustration the entire event log is also imported.
read_xes_cases("https://bupar.net/eventdata/repairExample.xes")
## # A tibble: 1,104 × 2
## CASE_concept_name CASE_description
## <chr> <chr>
## 1 1 Simulated process instance
## 2 10 Simulated process instance
## 3 100 Simulated process instance
## 4 1000 Simulated process instance
## 5 1001 Simulated process instance
## 6 1002 Simulated process instance
## 7 1003 Simulated process instance
## 8 1004 Simulated process instance
## 9 1005 Simulated process instance
## 10 1006 Simulated process instance
## # ℹ 1,094 more rows
read_xes("https://bupar.net/eventdata/repairExample.xes")
## Warning in read_xes("https://bupar.net/eventdata/repairExample.xes"): No
## activity instance identifier specified in xes-file. By default considered each
## event as a different activity instance. Please check!
## # Log of 11855 events consisting of:
## 77 traces
## 1104 cases
## 11855 instances of 8 activities
## 13 resources
## Events occurred from 1970-01-01 05:36:00 until 1970-01-24 08:16:00
##
## # Variables were mapped as follows:
## Case identifier: CASE_concept_name
## Activity identifier: activity_id
## Resource identifier: resource_id
## Activity instance identifier: activity_instance_id
## Timestamp: timestamp
## Lifecycle transition: lifecycle_id
##
## # A tibble: 11,855 × 12
## CASE_concept_name CASE_description activity_id defectFixed defectType
## <chr> <chr> <chr> <chr> <chr>
## 1 1 Simulated process insta… Register <NA> <NA>
## 2 1 Simulated process insta… Analyze De… <NA> <NA>
## 3 1 Simulated process insta… Analyze De… <NA> 6
## 4 1 Simulated process insta… Repair (Co… <NA> <NA>
## 5 1 Simulated process insta… Repair (Co… <NA> <NA>
## 6 1 Simulated process insta… Test Repair <NA> <NA>
## 7 1 Simulated process insta… Test Repair true <NA>
## 8 1 Simulated process insta… Inform User <NA> <NA>
## 9 1 Simulated process insta… Archive Re… true <NA>
## 10 10 Simulated process insta… Register <NA> <NA>
## # ℹ 11,845 more rows
## # ℹ 7 more variables: lifecycle_id <chr>, numberRepairs <chr>,
## # resource_id <chr>, phoneType <chr>, timestamp <dttm>,
## # activity_instance_id <int>, .order <int>
Use write_xes()
to write a XES file.
args(write_xes)
## function (eventlog, xesfile = file.choose(), case_attributes = NULL)
## NULL
It minimally requires 2 arguments:
eventlog
objectAdditionally, one can specify which of the variables in the event log
should be regarded as case attributes by supplying a character vector of
variable names to the case_attributes
argument. If this
argument is not specified, all the variables starting with the prefix
CASE_ will be considered as case attributes.
::patients
eventdataR
write_xes(patients, "patients.xes")
Read more:
On terminology: what in XES is called a trace
(i.e. between bupaR
the concept trace is reserved for
an activity sequence, and is not related to a specific process instance.
Many process instances can share the same trace of activities.
The terminology used in bupaR
is in correspondence with
current literature. See creating logs for
more information about the data model used.↩︎
Copyright © 2023 bupaR - Hasselt University