library(openxlsx)
library(tidyverse)
library(readxl)
Discovering solutions to the challenge of accurate date and time variable detection in Excel data with R
Detect date variables
When you try to read an Excel file, the dates don’t always look the way you would expect. You may see a vector of integers (or doubles) rather than a vector of dates. If you are using openxlsx, you can set detectDates = TRUE
in the function read.xlsx()
.
<- system.file("extdata", "readTest.xlsx", package = "openxlsx")
xlsxfile_path
# Vector of doubles instead of dates
<- read.xlsx(xlsxfile_path, sheet = 3) |>
xlsxfile_with_problems as_tibble()
xlsxfile_with_problems
# A tibble: 2,083 × 5
Date value word bool wordZ2
<dbl> <dbl> <chr> <lgl> <chr>
1 41757 0.839 N-U-B-R-A FALSE FALSE-Z
2 41756 0.886 N-Z-P-S-Y TRUE TRUE-Z
3 41755 0.574 C-G-D-X-H TRUE TRUE-Z
4 41754 0.137 <NA> FALSE FALSE-Z
5 41753 0.369 B-K-A-O-W TRUE TRUE-Z
6 41752 NA H-P-G-O-K TRUE TRUE-Z
7 41751 0.842 F-P-C-L-T TRUE TRUE-Z
8 41750 0.227 A-N-Q-P-V TRUE TRUE-Z
9 41749 0.276 Y-E-B-K-O TRUE TRUE-Z
10 41748 0.419 V-S-N-T-R TRUE TRUE-Z
# ℹ 2,073 more rows
glimpse(xlsxfile_with_problems)
Rows: 2,083
Columns: 5
$ Date <dbl> 41757, 41756, 41755, 41754, 41753, 41752, 41751, 41750, 41749, …
$ value <dbl> 0.839076400, 0.886380000, 0.574131400, 0.136606500, 0.369258200…
$ word <chr> "N-U-B-R-A", "N-Z-P-S-Y", "C-G-D-X-H", NA, "B-K-A-O-W", "H-P-G-…
$ bool <lgl> FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, T…
$ wordZ2 <chr> "FALSE-Z", "TRUE-Z", "TRUE-Z", "FALSE-Z", "TRUE-Z", "TRUE-Z", "…
# Vector of dates
<- read.xlsx(xlsxfile_path, sheet = 3, detectDates = TRUE) |>
xlsxfile as_tibble()
xlsxfile
# A tibble: 2,083 × 5
Date value word bool wordZ2
<date> <dbl> <chr> <lgl> <chr>
1 2014-04-28 0.839 N-U-B-R-A FALSE FALSE-Z
2 2014-04-27 0.886 N-Z-P-S-Y TRUE TRUE-Z
3 2014-04-26 0.574 C-G-D-X-H TRUE TRUE-Z
4 2014-04-25 0.137 <NA> FALSE FALSE-Z
5 2014-04-24 0.369 B-K-A-O-W TRUE TRUE-Z
6 2014-04-23 NA H-P-G-O-K TRUE TRUE-Z
7 2014-04-22 0.842 F-P-C-L-T TRUE TRUE-Z
8 2014-04-21 0.227 A-N-Q-P-V TRUE TRUE-Z
9 2014-04-20 0.276 Y-E-B-K-O TRUE TRUE-Z
10 2014-04-19 0.419 V-S-N-T-R TRUE TRUE-Z
# ℹ 2,073 more rows
glimpse(xlsxfile)
Rows: 2,083
Columns: 5
$ Date <date> 2014-04-28, 2014-04-27, 2014-04-26, 2014-04-25, 2014-04-24, 20…
$ value <dbl> 0.839076400, 0.886380000, 0.574131400, 0.136606500, 0.369258200…
$ word <chr> "N-U-B-R-A", "N-Z-P-S-Y", "C-G-D-X-H", NA, "B-K-A-O-W", "H-P-G-…
$ bool <lgl> FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, T…
$ wordZ2 <chr> "FALSE-Z", "TRUE-Z", "TRUE-Z", "FALSE-Z", "TRUE-Z", "TRUE-Z", "…
Convert double variables to date and time variables
Another way to convert a vector of integers is to use the function convertToDate()
or convertToDateTime()
.
<- readxl_example(path = "type-me.xlsx")
other_file <- read.xlsx(other_file, sheet = 3) |>
xlsxfile_datetime as_tibble() |>
slice(2:3) |>
select(`maybe.a.datetime?`) |>
pull()
xlsxfile_datetime
[1] "41051" "41026.479166666664"
convertToDate(xlsxfile_datetime[1])
[1] "2012-05-22"
convertToDateTime(xlsxfile_datetime[2])
[1] "2012-04-27 11:30:00 CEST"
References
These examples are inspired by:
Citation
BibTeX citation:
@online{lettry2023,
author = {Lettry, Layal Christine},
title = {Detect Date and Time Variables with Openxlsx},
date = {2023-10-08},
url = {https://rdiscovery.netlify.app/posts/2023-10-08_datetimes-openxlsx/},
langid = {en}
}
For attribution, please cite this work as:
Lettry, Layal Christine. 2023. “Detect Date and Time Variables
with Openxlsx.” October 8, 2023. https://rdiscovery.netlify.app/posts/2023-10-08_datetimes-openxlsx/.