Function detecting missing values at different levels of aggregation

  • overview: presents an overview of the absolute and relative number of missing values for each column

  • column: presents an overview of the absolute and relative number of missing values for a particular column

  • activity: presents an overview of the absolute and relative number of missing values for each column, aggregated by activity

detect_missing_values(
  activitylog,
  level_of_aggregation,
  column,
  details,
  filter_condition
)

Arguments

activitylog

The activity log

level_of_aggregation

Level of aggregation at which missing values are identified (either "overview", "column" or "activity)

column

Column name of the column that needs to be analyzed when the level of aggregation is "column"

details

Boolean indicating wheter details of the results need to be shown

filter_condition

Condition that is used to extract a subset of the activity log prior to the application of the function

Value

activitylog containing the rows of the original activity log which contain a missing value

Examples

# \donttest{ data("hospital_actlog") detect_missing_values(activitylog = hospital_actlog)
#> Selected level of aggregation:overview
#> *** OUTPUT ***
#> Absolute number of missing values per column:
#> #> patient_visit_nr 0 #> activity 0 #> originator 2 #> start 1 #> complete 0 #> triagecode 1 #> specialization 0
#> Relative number of missing values per column (expressed as percentage):
#> #> patient_visit_nr 0.000000 #> activity 0.000000 #> originator 3.773585 #> start 1.886792 #> complete 0.000000 #> triagecode 1.886792 #> specialization 0.000000
#> Overview of activity log rows which are incomplete:
#> # A tibble: 4 x 7 #> patient_visit_nr activity originator start complete #> <dbl> <chr> <chr> <dttm> <dttm> #> 1 510 Clinica~ Doctor 7 2017-11-20 11:35:01 2017-11-20 11:36:09 #> 2 533 0 NA 2017-11-22 18:35:00 2017-11-22 18:37:00 #> 3 534 Registr~ NA 2017-11-22 18:35:00 2017-11-22 18:37:00 #> 4 512 Clinica~ Doctor 7 NA 2017-11-20 11:33:57 #> # ... with 2 more variables: triagecode <dbl>, specialization <chr>
detect_missing_values(activitylog = hospital_actlog, level_of_aggregation = "activity")
#> Selected level of aggregation:activity
#> *** OUTPUT ***
#> Absolute number of missing values per column (per activity):
#> # A tibble: 9 x 7 #> activity patient_visit_nr originator start complete triagecode specialization #> <chr> <int> <int> <int> <int> <int> <int> #> 1 0 0 1 0 0 0 0 #> 2 Clinical~ 0 0 1 0 1 0 #> 3 registra~ 0 0 0 0 0 0 #> 4 Registra~ 0 1 0 0 0 0 #> 5 Trage 0 0 0 0 0 0 #> 6 Treatment 0 0 0 0 0 0 #> 7 Treatmen~ 0 0 0 0 0 0 #> 8 Triaga 0 0 0 0 0 0 #> 9 Triage 0 0 0 0 0 0
#> Relative number of missing values per column (per activity, expressed as percentage):
#> # A tibble: 9 x 7 #> activity patient_visit_nr originator start complete triagecode specialization #> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 0 0 1 0 0 0 0 #> 2 Clinical~ 0 0 0.111 0 0.111 0 #> 3 registra~ 0 0 0 0 0 0 #> 4 Registra~ 0 0.0714 0 0 0 0 #> 5 Trage 0 0 0 0 0 0 #> 6 Treatment 0 0 0 0 0 0 #> 7 Treatmen~ 0 0 0 0 0 0 #> 8 Triaga 0 0 0 0 0 0 #> 9 Triage 0 0 0 0 0 0
#> Overview of activity log rows which are incomplete:
#> # A tibble: 4 x 7 #> patient_visit_nr activity originator start complete #> <dbl> <chr> <chr> <dttm> <dttm> #> 1 510 Clinica~ Doctor 7 2017-11-20 11:35:01 2017-11-20 11:36:09 #> 2 533 0 NA 2017-11-22 18:35:00 2017-11-22 18:37:00 #> 3 534 Registr~ NA 2017-11-22 18:35:00 2017-11-22 18:37:00 #> 4 512 Clinica~ Doctor 7 NA 2017-11-20 11:33:57 #> # ... with 2 more variables: triagecode <dbl>, specialization <chr>
detect_missing_values(activitylog = hospital_actlog, level_of_aggregation = "column", column = "triagecode")
#> Selected level of aggregation:column
#> *** OUTPUT ***
#> Absolute number of missing values in columntriagecode:1
#> Relative number of missing values in columntriagecode(expressed as percentage):1.88679245283019
#>
#> Overview of activity log rows in whichtriagecodeis missing:
#> # A tibble: 1 x 7 #> patient_visit_nr activity originator start complete #> <dbl> <chr> <chr> <dttm> <dttm> #> 1 510 Clinica~ Doctor 7 2017-11-20 11:35:01 2017-11-20 11:36:09 #> # ... with 2 more variables: triagecode <dbl>, specialization <chr>
# }