Detect date and time variables with openxlsx

openxlsx
date
datetime
Author
Affiliations

Layal Christine Lettry

cynkra GmbH

University of Fribourg, Dept. of Informatics, ASAM Group

Published

October 8, 2023

Discovering solutions to the challenge of accurate date and time variable detection in Excel data with R

Detect date variables

When you try to read an Excel file, the dates don’t always look the way you would expect. You may see a vector of integers (or doubles) rather than a vector of dates. If you are using openxlsx, you can set detectDates = TRUE in the function read.xlsx().

library(openxlsx)
library(tidyverse)
library(readxl)
xlsxfile_path <- system.file("extdata", "readTest.xlsx", package = "openxlsx")

# Vector of doubles instead of dates
xlsxfile_with_problems <- read.xlsx(xlsxfile_path, sheet = 3) |> 
  as_tibble()
xlsxfile_with_problems
# A tibble: 2,083 × 5
    Date  value word      bool  wordZ2 
   <dbl>  <dbl> <chr>     <lgl> <chr>  
 1 41757  0.839 N-U-B-R-A FALSE FALSE-Z
 2 41756  0.886 N-Z-P-S-Y TRUE  TRUE-Z 
 3 41755  0.574 C-G-D-X-H TRUE  TRUE-Z 
 4 41754  0.137 <NA>      FALSE FALSE-Z
 5 41753  0.369 B-K-A-O-W TRUE  TRUE-Z 
 6 41752 NA     H-P-G-O-K TRUE  TRUE-Z 
 7 41751  0.842 F-P-C-L-T TRUE  TRUE-Z 
 8 41750  0.227 A-N-Q-P-V TRUE  TRUE-Z 
 9 41749  0.276 Y-E-B-K-O TRUE  TRUE-Z 
10 41748  0.419 V-S-N-T-R TRUE  TRUE-Z 
# ℹ 2,073 more rows
glimpse(xlsxfile_with_problems)
Rows: 2,083
Columns: 5
$ Date   <dbl> 41757, 41756, 41755, 41754, 41753, 41752, 41751, 41750, 41749, …
$ value  <dbl> 0.839076400, 0.886380000, 0.574131400, 0.136606500, 0.369258200…
$ word   <chr> "N-U-B-R-A", "N-Z-P-S-Y", "C-G-D-X-H", NA, "B-K-A-O-W", "H-P-G-…
$ bool   <lgl> FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, T…
$ wordZ2 <chr> "FALSE-Z", "TRUE-Z", "TRUE-Z", "FALSE-Z", "TRUE-Z", "TRUE-Z", "…
# Vector of dates
xlsxfile <- read.xlsx(xlsxfile_path, sheet = 3, detectDates = TRUE)  |> 
  as_tibble()
xlsxfile
# A tibble: 2,083 × 5
   Date        value word      bool  wordZ2 
   <date>      <dbl> <chr>     <lgl> <chr>  
 1 2014-04-28  0.839 N-U-B-R-A FALSE FALSE-Z
 2 2014-04-27  0.886 N-Z-P-S-Y TRUE  TRUE-Z 
 3 2014-04-26  0.574 C-G-D-X-H TRUE  TRUE-Z 
 4 2014-04-25  0.137 <NA>      FALSE FALSE-Z
 5 2014-04-24  0.369 B-K-A-O-W TRUE  TRUE-Z 
 6 2014-04-23 NA     H-P-G-O-K TRUE  TRUE-Z 
 7 2014-04-22  0.842 F-P-C-L-T TRUE  TRUE-Z 
 8 2014-04-21  0.227 A-N-Q-P-V TRUE  TRUE-Z 
 9 2014-04-20  0.276 Y-E-B-K-O TRUE  TRUE-Z 
10 2014-04-19  0.419 V-S-N-T-R TRUE  TRUE-Z 
# ℹ 2,073 more rows
glimpse(xlsxfile)
Rows: 2,083
Columns: 5
$ Date   <date> 2014-04-28, 2014-04-27, 2014-04-26, 2014-04-25, 2014-04-24, 20…
$ value  <dbl> 0.839076400, 0.886380000, 0.574131400, 0.136606500, 0.369258200…
$ word   <chr> "N-U-B-R-A", "N-Z-P-S-Y", "C-G-D-X-H", NA, "B-K-A-O-W", "H-P-G-…
$ bool   <lgl> FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, T…
$ wordZ2 <chr> "FALSE-Z", "TRUE-Z", "TRUE-Z", "FALSE-Z", "TRUE-Z", "TRUE-Z", "…

Convert double variables to date and time variables

Another way to convert a vector of integers is to use the function convertToDate() or convertToDateTime().

other_file <- readxl_example(path = "type-me.xlsx")
xlsxfile_datetime <- read.xlsx(other_file, sheet = 3) |> 
  as_tibble() |> 
  slice(2:3) |> 
  select(`maybe.a.datetime?`) |> 
  pull()
xlsxfile_datetime
[1] "41051"              "41026.479166666664"
convertToDate(xlsxfile_datetime[1])
[1] "2012-05-22"
convertToDateTime(xlsxfile_datetime[2])
[1] "2012-04-27 11:30:00 CEST"

References

These examples are inspired by:

Citation

BibTeX citation:
@online{lettry2023,
  author = {Lettry, Layal Christine},
  title = {Detect Date and Time Variables with Openxlsx},
  date = {2023-10-08},
  url = {https://rdiscovery.netlify.app/posts/2023-10-08_datetimes-openxlsx/},
  langid = {en}
}
For attribution, please cite this work as:
Lettry, Layal Christine. 2023. “Detect Date and Time Variables with Openxlsx.” October 8, 2023. https://rdiscovery.netlify.app/posts/2023-10-08_datetimes-openxlsx/.