Data Quality

Despite the extensive opportunities that process mining techniques provide, the garbage in - garbage out principle still applies. Data quality issues are widespread in real-life data and can generate misleading results when used for analysis purposes. daqapo - Data Quality Assessment for Process-Oriented data - provides a set of assessment functions to identify a wide array of quality issues.

Getting started

In the examples below, we use the dataset hospital_actlog, which is an artificial event log with data quality issues provided by daqapo.

library(daqapo)
library(dplyr)
data("hospital_actlog")
data("hospital_events")
hospital_actlog <- activitylog(hospital_actlog)

Attribute Dependencies

Detect violations of dependencies between attributes (i.e. condition(s) that should hold when (an)other condition(s) hold(s)).

Example: when the activity is “Registration”, the originator should start with “Clerk”.

hospital_actlog %>% 
  detect_attribute_dependencies(antecedent = activity == "Registration",
                                consequent = startsWith(originator,"Clerk"))
## *** OUTPUT ***
## The following statement was checked: if condition(s) ~activity == "Registration" hold(s), then ~startsWith(originator, "Clerk") should also hold.
## This statement holds for 12 (85.71%) of the rows in the activity log for which the first condition(s) hold and does not hold for 2 (14.29%) of these rows.
## For the following rows, the first condition(s) hold(s), but the second condition does not:
## # Log of 10 events consisting of:
## 2 traces 
## 4 cases 
## 5 instances of 1 activity 
## 5 resources 
## Events occurred from 2017-11-21 18:10:17 until 2017-11-22 18:37:00 
##  
## # Variables were mapped as follows:
## Case identifier:     patient_visit_nr 
## Activity identifier:     activity 
## Resource identifier:     originator 
## Timestamps:      start, complete 
## 
## # A tibble: 5 × 8
##   patient_visit_nr activity   originator start               complete           
##              <dbl> <chr>      <chr>      <dttm>              <dttm>             
## 1              528 Registrat… Nurse 6    2017-11-21 18:10:17 2017-11-21 18:15:04
## 2              535 Registrat… Clerk 3    2017-11-22 10:04:57 2017-11-22 10:06:46
## 3              536 Registrat… Clerk 9    2017-11-22 10:26:41 2017-11-22 10:32:56
## 4              535 Registrat… Clerk 6    2017-11-22 11:05:42 2017-11-22 11:11:11
## 5              534 Registrat… <NA>       2017-11-22 18:35:00 2017-11-22 18:37:00
## # ℹ 3 more variables: triagecode <dbl>, specialization <chr>, .order <int>

Case ID Sequence Gaps

Detect gaps in the sequence of case identifiers.

hospital_actlog %>%
  detect_case_id_sequence_gaps()
## *** OUTPUT ***
## It was checked whether there are gaps in the sequence of case IDs
## From the 27 expected cases in the activity log, ranging from 510 to 536, 5 (18.52%) are missing.
## These missing case numbers are:
## # A tibble: 2 × 3
##    from    to n_missing
##   <dbl> <dbl>     <dbl>
## 1   511   511         1
## 2   513   516         4

Conditional Activity Presence

Check whether certain activities are present when a specific condition is satisfied.

For example, if specialization is “TRAU”, then the activity “Clinical exam” must take place.

hospital_actlog %>%
  detect_conditional_activity_presence(condition = specialization == "TRAU",
                                       activities = "Clinical exam")
## *** OUTPUT ***
## The following statement was checked: if condition(s) ~specialization == "TRAU" hold(s), then activity/activities Clinical exam should be recorded
## The condition(s) hold(s) for 2 cases. From these cases:
## - the specified activity/activities is/are recorded for 2 case(s) (100%)
## - the specified activity/activities is/are not recorded for 0 case(s) (0%)

Duration Outliers

Detect duration outliers for particular activities.

For example, the duration of “Treatment” should be within 1 standard deviation of its mean duration.

hospital_actlog %>%
  detect_duration_outliers(Treatment = duration_within(bound_sd = 1))
## *** OUTPUT ***
## Outliers are detected for following activities
## Treatment     Lower bound: 5.06   Upper bound: 22.2
## A total of 1 is detected (1.89% of the activity executions)
## For the following activity instances, outliers are detected:
## # Log of 2 events consisting of:
## 1 trace 
## 1 case 
## 1 instance of 1 activity 
## 1 resource 
## Events occurred from 2017-11-21 18:26:04 until 2017-11-21 18:55:00 
##  
## # Variables were mapped as follows:
## Case identifier:     patient_visit_nr 
## Activity identifier:     activity 
## Resource identifier:     originator 
## Timestamps:      start, complete 
## 
## # A tibble: 1 × 14
##   patient_visit_nr activity  originator start               complete           
##              <dbl> <chr>     <chr>      <dttm>              <dttm>             
## 1              523 Treatment Nurse 17   2017-11-21 18:26:04 2017-11-21 18:55:00
## # ℹ 9 more variables: triagecode <dbl>, specialization <chr>, .order <int>,
## #   duration <dbl>, mean <dbl>, sd <dbl>, bound_sd <dbl>, lower_bound <dbl>,
## #   upper_bound <dbl>

Or, the duration of “Treatment” should be within 0 to 15 minutes.

hospital_actlog %>%
  detect_duration_outliers(Treatment = duration_within(lower_bound = 0, upper_bound = 15))
## *** OUTPUT ***
## Outliers are detected for following activities
## Treatment     Lower bound: 0      Upper bound: 15
## A total of 1 is detected (1.89% of the activity executions)
## For the following activity instances, outliers are detected:
## # Log of 2 events consisting of:
## 1 trace 
## 1 case 
## 1 instance of 1 activity 
## 1 resource 
## Events occurred from 2017-11-21 18:26:04 until 2017-11-21 18:55:00 
##  
## # Variables were mapped as follows:
## Case identifier:     patient_visit_nr 
## Activity identifier:     activity 
## Resource identifier:     originator 
## Timestamps:      start, complete 
## 
## # A tibble: 1 × 14
##   patient_visit_nr activity  originator start               complete           
##              <dbl> <chr>     <chr>      <dttm>              <dttm>             
## 1              523 Treatment Nurse 17   2017-11-21 18:26:04 2017-11-21 18:55:00
## # ℹ 9 more variables: triagecode <dbl>, specialization <chr>, .order <int>,
## #   duration <dbl>, mean <dbl>, sd <dbl>, bound_sd <dbl>, lower_bound <dbl>,
## #   upper_bound <dbl>

Inactive Periods

Detect periods of time in which no activity executions are recorded, using a threshold specified in minutes.

For example, detect whether there are periods of more than 30 minutes without any activity executions.

hospital_actlog %>%
  detect_inactive_periods(threshold = 30)
## Selected timestamp parameter value: both
## Selected inactivity type:arrivals
## *** OUTPUT ***
## Specified threshold of 30 minutes is violated 9 times.
## Threshold is violated in the following periods:
##          period_start          period_end   time_gap
## 1 2017-11-20 10:20:06 2017-11-21 11:35:16 1515.16667
## 2 2017-11-21 11:22:16 2017-11-21 11:59:41   37.41667
## 3 2017-11-21 12:05:52 2017-11-21 13:43:16   97.40000
## 4 2017-11-21 14:06:09 2017-11-21 15:12:17   66.13333
## 5 2017-11-21 15:18:19 2017-11-21 16:42:08   83.81667
## 6 2017-11-21 17:06:10 2017-11-21 18:02:10   56.00000
## 7 2017-11-21 18:15:04 2017-11-22 10:04:57  949.88333
## 8 2017-11-22 10:32:56 2017-11-22 16:30:00  357.06667
## 9 2017-11-22 17:00:00 2017-11-22 18:00:00   60.00000

Incomplete Cases

Check whether there are cases that miss a specific activity.

For example, does any of the cases miss the 5 listed activities?

hospital_actlog %>%
  detect_incomplete_cases(activities = c("Registration","Triage","Clinical exam","Treatment","Treatment evaluation"))
## *** OUTPUT ***
## It was checked whether the activities Clinical exam, Registration, Treatment, Treatment evaluation, Triage are present for cases.
## These activities are present for 4 (39.62%) of the cases and are not present for 18 (60.38%) of the cases.
## Note: this function only checks the presence of activities for a particular case, not the completeness of these entries in the activity log or the order of activities.
## For cases for which the aforementioned activities are not all present, the following activities are recorded (ordered by decreasing frequeny of occurrence):
## # A tibble: 9 × 3
##   activity                 n case_ids                                           
##   <chr>                <int> <chr>                                              
## 1 Triage                  11 510 - 512 - 517 - 521 - 524 - 525 - 526 - 527 - 52…
## 2 Registration             9 512 - 518 - 518 - 518 - 521 - 522 - 527 - 528 - 534
## 3 Clinical exam            5 512 - 510 - 527 - 528 - 512                        
## 4 Treatment evaluation     2 529 - 532                                          
## 5 0                        1 533                                                
## 6 Trage                    1 520                                                
## 7 Treatment                1 532                                                
## 8 Triaga                   1 522                                                
## 9 registration             1 510

Incorrect Activity Names

Given a set of allowed activities, are there any activities that are incorrect?

hospital_actlog %>%
  detect_incorrect_activity_names(allowed_activities = c("Registration","Triage","Clinical exam","Treatment","Treatment evaluation"))
## *** OUTPUT ***
## 4 out of 9 (44.44% ) activity labels are identified to be incorrect.
## These activity labels are:
## registration - Trage - Triaga - 0
## Given this information, 4 of 53 (7.55%) rows in the activity log are incorrect. These are the following:
## # Log of 8 events consisting of:
## 4 traces 
## 4 cases 
## 4 instances of 4 activities 
## 4 resources 
## Events occurred from 2017-11-20 10:18:17 until 2017-11-22 18:37:00 
##  
## # Variables were mapped as follows:
## Case identifier:     patient_visit_nr 
## Activity identifier:     activity 
## Resource identifier:     originator 
## Timestamps:      start, complete 
## 
## # A tibble: 4 × 8
##   patient_visit_nr activity   originator start               complete           
##              <dbl> <chr>      <chr>      <dttm>              <dttm>             
## 1              510 registrat… Clerk 9    2017-11-20 10:18:17 2017-11-20 10:20:06
## 2              520 Trage      Nurse 17   2017-11-21 13:43:16 2017-11-21 13:39:00
## 3              522 Triaga     Nurse 5    2017-11-21 15:15:25 2017-11-21 15:18:04
## 4              533 0          <NA>       2017-11-22 18:35:00 2017-11-22 18:37:00
## # ℹ 3 more variables: triagecode <dbl>, specialization <chr>, .order <int>

Missing Values

Analyse the missing values of the log. This can be done in general, or at the level of activities or specific columns.

hospital_actlog %>%
  detect_missing_values()
## Selected level of aggregation:overview
## *** OUTPUT ***
## Absolute number of missing values per column:
##                   
## patient_visit_nr 0
## activity         0
## originator       2
## start            1
## complete         0
## triagecode       1
## specialization   0
## .order           0
## Relative number of missing values per column (expressed as percentage):
##                          
## patient_visit_nr 0.000000
## activity         0.000000
## originator       3.773585
## start            1.886792
## complete         0.000000
## triagecode       1.886792
## specialization   0.000000
## .order           0.000000
## Overview of activity log rows which are incomplete:
## # Log of 7 events consisting of:
## 3 traces 
## 4 cases 
## 4 instances of 3 activities 
## 2 resources 
## Events occurred from NA until NA 
##  
## # Variables were mapped as follows:
## Case identifier:     patient_visit_nr 
## Activity identifier:     activity 
## Resource identifier:     originator 
## Timestamps:      start, complete 
## 
## # A tibble: 4 × 8
##   patient_visit_nr activity   originator start               complete           
##              <dbl> <chr>      <chr>      <dttm>              <dttm>             
## 1              510 Clinical … Doctor 7   2017-11-20 11:35:01 2017-11-20 11:36:09
## 2              533 0          <NA>       2017-11-22 18:35:00 2017-11-22 18:37:00
## 3              534 Registrat… <NA>       2017-11-22 18:35:00 2017-11-22 18:37:00
## 4              512 Clinical … Doctor 7   NA                  2017-11-20 11:33:57
## # ℹ 3 more variables: triagecode <dbl>, specialization <chr>, .order <int>
hospital_actlog %>% 
  detect_missing_values(level_of_aggregation = "activity")
## Selected level of aggregation:activity
## *** OUTPUT ***
## Absolute number of missing values per column (per activity):
## # A tibble: 9 × 8
##   activity  patient_visit_nr originator start complete triagecode specialization
##   <chr>                <int>      <int> <int>    <int>      <int>          <int>
## 1 0                        0          1     0        0          0              0
## 2 Clinical…                0          0     1        0          1              0
## 3 Registra…                0          1     0        0          0              0
## 4 Trage                    0          0     0        0          0              0
## 5 Treatment                0          0     0        0          0              0
## 6 Treatmen…                0          0     0        0          0              0
## 7 Triaga                   0          0     0        0          0              0
## 8 Triage                   0          0     0        0          0              0
## 9 registra…                0          0     0        0          0              0
## # ℹ 1 more variable: .order <int>
## Relative number of missing values per column (per activity, expressed as percentage):
## # A tibble: 9 × 8
##   activity  patient_visit_nr originator start complete triagecode specialization
##   <chr>                <dbl>      <dbl> <dbl>    <dbl>      <dbl>          <dbl>
## 1 0                        0     1      0            0      0                  0
## 2 Clinical…                0     0      0.111        0      0.111              0
## 3 Registra…                0     0.0714 0            0      0                  0
## 4 Trage                    0     0      0            0      0                  0
## 5 Treatment                0     0      0            0      0                  0
## 6 Treatmen…                0     0      0            0      0                  0
## 7 Triaga                   0     0      0            0      0                  0
## 8 Triage                   0     0      0            0      0                  0
## 9 registra…                0     0      0            0      0                  0
## # ℹ 1 more variable: .order <dbl>
## Overview of activity log rows which are incomplete:
## # Log of 7 events consisting of:
## 3 traces 
## 4 cases 
## 4 instances of 3 activities 
## 2 resources 
## Events occurred from NA until NA 
##  
## # Variables were mapped as follows:
## Case identifier:     patient_visit_nr 
## Activity identifier:     activity 
## Resource identifier:     originator 
## Timestamps:      start, complete 
## 
## # A tibble: 4 × 8
##   patient_visit_nr activity   originator start               complete           
##              <dbl> <chr>      <chr>      <dttm>              <dttm>             
## 1              510 Clinical … Doctor 7   2017-11-20 11:35:01 2017-11-20 11:36:09
## 2              533 0          <NA>       2017-11-22 18:35:00 2017-11-22 18:37:00
## 3              534 Registrat… <NA>       2017-11-22 18:35:00 2017-11-22 18:37:00
## 4              512 Clinical … Doctor 7   NA                  2017-11-20 11:33:57
## # ℹ 3 more variables: triagecode <dbl>, specialization <chr>, .order <int>
hospital_actlog %>% 
  detect_missing_values(
  level_of_aggregation = "column",
  column = "triagecode")
## Selected level of aggregation:column
## *** OUTPUT ***
## Absolute number of missing values in columntriagecode:1
## Relative number of missing values in columntriagecode(expressed as percentage):1.88679245283019
## 
## Overview of activity log rows in whichtriagecodeis missing:
## # Log of 2 events consisting of:
## 1 trace 
## 1 case 
## 1 instance of 1 activity 
## 1 resource 
## Events occurred from 2017-11-20 11:35:01 until 2017-11-20 11:36:09 
##  
## # Variables were mapped as follows:
## Case identifier:     patient_visit_nr 
## Activity identifier:     activity 
## Resource identifier:     originator 
## Timestamps:      start, complete 
## 
## # A tibble: 1 × 8
##   patient_visit_nr activity   originator start               complete           
##              <dbl> <chr>      <chr>      <dttm>              <dttm>             
## 1              510 Clinical … Doctor 7   2017-11-20 11:35:01 2017-11-20 11:36:09
## # ℹ 3 more variables: triagecode <dbl>, specialization <chr>, .order <int>

Multiregistration

Detect whether there are multiple activity executions registered by the same resource (or for the same case), in a short period of time. This period of time can be specified with a threshold in seconds.

hospital_actlog %>%
  detect_multiregistration(threshold_in_seconds = 10)
## Selected level of aggregation: resource
## Selected timestamp parameter value: complete
## *** OUTPUT ***
## Multi-registration is detected for 4 of the 12 resources (33.33%). These resources are:
## Doctor 7 - Nurse 27 - Nurse 5 - NA
## For the following rows in the activity log, multi-registration is detected:
## # Log of 17 events consisting of:
## 5 traces 
## 7 cases 
## 9 instances of 5 activities 
## 4 resources 
## Events occurred from NA until NA 
##  
## # Variables were mapped as follows:
## Case identifier:     patient_visit_nr 
## Activity identifier:     activity 
## Resource identifier:     originator 
## Timestamps:      start, complete 
## 
## # A tibble: 9 × 8
##   originator patient_visit_nr activity   start               complete           
##   <chr>                 <dbl> <chr>      <dttm>              <dttm>             
## 1 Doctor 7                512 Clinical … 2017-11-20 11:27:12 2017-11-20 11:33:57
## 2 Doctor 7                512 Clinical … NA                  2017-11-20 11:33:57
## 3 Nurse 27                536 Triage     2017-11-22 15:15:39 2017-11-22 15:25:01
## 4 Nurse 27                536 Treatment  2017-11-22 15:15:41 2017-11-22 15:25:03
## 5 Nurse 5                 524 Triage     2017-11-21 17:04:03 2017-11-21 17:06:05
## 6 Nurse 5                 525 Triage     2017-11-21 17:04:13 2017-11-21 17:06:08
## 7 Nurse 5                 526 Triage     2017-11-21 17:04:15 2017-11-21 17:06:10
## 8 <NA>                    533 0          2017-11-22 18:35:00 2017-11-22 18:37:00
## 9 <NA>                    534 Registrat… 2017-11-22 18:35:00 2017-11-22 18:37:00
## # ℹ 3 more variables: triagecode <dbl>, specialization <chr>, .order <int>

Overlaps

Check if a resource has performed two or more activities in parallel.

hospital_actlog %>%
  detect_overlaps()
## # A tibble: 7 × 4
##   activity_a    activity_b        n avg_overlap_mins
##   <chr>         <chr>         <int>            <dbl>
## 1 Clinical exam Treatment         2            8.17 
## 2 Registration  Clinical exam     1            1.9  
## 3 Registration  Triaga            1            2.65 
## 4 Registration  Triage            1            1.93 
## 5 Triage        Clinical exam     2            5.63 
## 6 Triage        Registration      1            0.817
## 7 Triage        Treatment         1            9.33

Similar Labels

Check for similar labels in a specific column. Both the column and the maximum allowed edit distance for two labels to consider similar can be configured.

hospital_actlog %>%
  detect_similar_labels(column_labels = "activity", max_edit_distance = 3)
## Warning in detect_similar_labels.activitylog(., column_labels = "activity", :
## Not all provided columns are of type character or factor and will be ignored:
## patient_visit_nr,start,complete,.order
## # A tibble: 16 × 3
##    column_labels labels       similar_to                   
##    <chr>         <chr>        <chr>                        
##  1 activity      registration Registration                 
##  2 activity      Registration registration                 
##  3 activity      Triage       Trage - Triaga               
##  4 activity      Trage        Triage - Triaga              
##  5 activity      Triaga       Triage - Trage               
##  6 originator    Clerk 9      Clerk 12 - Clerk 6 - Clerk 3 
##  7 originator    Clerk 12     Clerk 9 - Clerk 6 - Clerk 3  
##  8 originator    Nurse 27     Nurse 17 - Nurse 5 - Nurse 6 
##  9 originator    Doctor 7     Doctor 4 - Doctor 1          
## 10 originator    Nurse 17     Nurse 27 - Nurse 5 - Nurse 6 
## 11 originator    Clerk 6      Clerk 9 - Clerk 12 - Clerk 3 
## 12 originator    Doctor 4     Doctor 7 - Doctor 1          
## 13 originator    Clerk 3      Clerk 9 - Clerk 12 - Clerk 6 
## 14 originator    Nurse 5      Nurse 27 - Nurse 17 - Nurse 6
## 15 originator    Nurse 6      Nurse 27 - Nurse 17 - Nurse 5
## 16 originator    Doctor 1     Doctor 7 - Doctor 4

Time Anomalies

Detect activity executions with negative or zero duration.

hospital_actlog %>%
  detect_time_anomalies()
## Selected anomaly type: both
## *** OUTPUT ***
## For 5 rows in the activity log (9.43%), an anomaly is detected.
## The anomalies are spread over the activities as follows:
## # A tibble: 3 × 3
##   activity      type                  n
##   <chr>         <chr>             <int>
## 1 Registration  negative duration     3
## 2 Clinical exam zero duration         1
## 3 Trage         negative duration     1
## Anomalies are found in the following rows:
## # Log of 10 events consisting of:
## 3 traces 
## 3 cases 
## 5 instances of 3 activities 
## 5 resources 
## Events occurred from 2017-11-21 11:22:16 until 2017-11-21 19:00:00 
##  
## # Variables were mapped as follows:
## Case identifier:     patient_visit_nr 
## Activity identifier:     activity 
## Resource identifier:     originator 
## Timestamps:      start, complete 
## 
## # A tibble: 5 × 10
##   patient_visit_nr activity   originator start               complete           
##              <dbl> <chr>      <chr>      <dttm>              <dttm>             
## 1              518 Registrat… Clerk 12   2017-11-21 11:45:16 2017-11-21 11:22:16
## 2              518 Registrat… Clerk 6    2017-11-21 11:45:16 2017-11-21 11:22:16
## 3              518 Registrat… Clerk 9    2017-11-21 11:45:16 2017-11-21 11:22:16
## 4              520 Trage      Nurse 17   2017-11-21 13:43:16 2017-11-21 13:39:00
## 5              528 Clinical … Doctor 1   2017-11-21 19:00:00 2017-11-21 19:00:00
## # ℹ 5 more variables: triagecode <dbl>, specialization <chr>, .order <int>,
## #   duration <dbl>, type <chr>

Unique Values

List all unique combinations of the specified columns.

hospital_actlog %>%
  detect_unique_values(column_labels = "activity")
## *** OUTPUT ***
## Distinct entries are computed for the following columns: 
## activity
## # Log of 105 events consisting of:
## 14 traces 
## 22 cases 
## 53 instances of 9 activities 
## 12 resources 
## Events occurred from NA until NA 
##  
## # Variables were mapped as follows:
## Case identifier:     patient_visit_nr 
## Activity identifier:     activity 
## Resource identifier:     originator 
## Timestamps:      start, complete 
## 
## # A tibble: 53 × 6
##    activity  patient_visit_nr originator start               complete           
##    <chr>                <dbl> <chr>      <dttm>              <dttm>             
##  1 registra…              510 Clerk 9    2017-11-20 10:18:17 2017-11-20 10:20:06
##  2 Registra…              512 Clerk 12   2017-11-20 10:33:14 2017-11-20 10:37:00
##  3 Triage                 510 Nurse 27   2017-11-20 10:34:08 2017-11-20 10:41:48
##  4 Triage                 512 Nurse 27   2017-11-20 10:44:12 2017-11-20 10:50:17
##  5 Clinical…              512 Doctor 7   2017-11-20 11:27:12 2017-11-20 11:33:57
##  6 Clinical…              510 Doctor 7   2017-11-20 11:35:01 2017-11-20 11:36:09
##  7 Triage                 517 Nurse 17   2017-11-21 11:35:16 2017-11-21 11:39:00
##  8 Registra…              518 Clerk 12   2017-11-21 11:45:16 2017-11-21 11:22:16
##  9 Registra…              518 Clerk 6    2017-11-21 11:45:16 2017-11-21 11:22:16
## 10 Registra…              518 Clerk 9    2017-11-21 11:45:16 2017-11-21 11:22:16
## # ℹ 43 more rows
## # ℹ 1 more variable: .order <int>
hospital_actlog %>%
  detect_unique_values(column_labels = c("activity", "originator"))
## *** OUTPUT ***
## Distinct entries are computed for the following columns: 
## activity - originator
## # Log of 105 events consisting of:
## 14 traces 
## 22 cases 
## 53 instances of 9 activities 
## 12 resources 
## Events occurred from NA until NA 
##  
## # Variables were mapped as follows:
## Case identifier:     patient_visit_nr 
## Activity identifier:     activity 
## Resource identifier:     originator 
## Timestamps:      start, complete 
## 
## # A tibble: 53 × 6
##    activity  originator patient_visit_nr start               complete           
##    <chr>     <chr>                 <dbl> <dttm>              <dttm>             
##  1 registra… Clerk 9                 510 2017-11-20 10:18:17 2017-11-20 10:20:06
##  2 Registra… Clerk 12                512 2017-11-20 10:33:14 2017-11-20 10:37:00
##  3 Triage    Nurse 27                510 2017-11-20 10:34:08 2017-11-20 10:41:48
##  4 Triage    Nurse 27                512 2017-11-20 10:44:12 2017-11-20 10:50:17
##  5 Clinical… Doctor 7                512 2017-11-20 11:27:12 2017-11-20 11:33:57
##  6 Clinical… Doctor 7                510 2017-11-20 11:35:01 2017-11-20 11:36:09
##  7 Triage    Nurse 17                517 2017-11-21 11:35:16 2017-11-21 11:39:00
##  8 Registra… Clerk 12                518 2017-11-21 11:45:16 2017-11-21 11:22:16
##  9 Registra… Clerk 6                 518 2017-11-21 11:45:16 2017-11-21 11:22:16
## 10 Registra… Clerk 9                 518 2017-11-21 11:45:16 2017-11-21 11:22:16
## # ℹ 43 more rows
## # ℹ 1 more variable: .order <int>

Value Range Violations

Detect value range violation.

hospital_actlog %>%
  detect_value_range_violations(triagecode = domain_numeric(from = 0, to = 5))
## $triagecode
## $type
## [1] "numeric"
## 
## $from
## [1] 0
## 
## $to
## [1] 5
## 
## attr(,"class")
## [1] "value_range" "list"
## *** OUTPUT ***
## The domain range for column triagecode is checked.
## Values allowed between 0 and 5
## The values fall within the specified domain range for 46 (86.79%) of the rows in the activity log and outside the domain range for 7 (13.21%) of these rows.
## 
## The following rows fall outside the specified domain range for indicated column:
## # Log of 14 events consisting of:
## 5 traces 
## 6 cases 
## 7 instances of 5 activities 
## 4 resources 
## Events occurred from 2017-11-20 11:35:01 until 2017-11-23 18:33:00 
##  
## # Variables were mapped as follows:
## Case identifier:     patient_visit_nr 
## Activity identifier:     activity 
## Resource identifier:     originator 
## Timestamps:      start, complete 
## 
## # A tibble: 7 × 9
##   column_checked patient_visit_nr activity        originator start              
##   <chr>                     <dbl> <chr>           <chr>      <dttm>             
## 1 triagecode                  510 Clinical exam   Doctor 7   2017-11-20 11:35:01
## 2 triagecode                  529 Treatment eval… Doctor 1   2017-11-22 16:30:00
## 3 triagecode                  530 Triage          Nurse 17   2017-11-22 18:00:00
## 4 triagecode                  531 Triage          Nurse 17   2017-11-22 18:05:00
## 5 triagecode                  532 Treatment       Nurse 17   2017-11-22 18:15:00
## 6 triagecode                  532 Treatment eval… Doctor 7   2017-11-22 18:27:00
## 7 triagecode                  533 0               <NA>       2017-11-22 18:35:00
## # ℹ 4 more variables: complete <dttm>, triagecode <dbl>, specialization <chr>,
## #   .order <int>

Read more:


Copyright © 2023 bupaR - Hasselt University