Action | openxlsx | chain | pipe |
---|---|---|---|
Load file | `wb_load()` | `wb_load()` | |
Read file | `wb_to_df()` | `wb_to_df()` | |
Read file | `read.xlsx()` | `read_xlsx()` | `read_xlsx()` |
Read file | `readWorkbook()` | `wb_read()` | `wb_read()` |
What are the main differences between openxlsx and openxlsx2?
Introduction
Within this blog post, my aim is to elucidate the primary distinctions between openxlsx and openxlsx2. I’ve encountered these variances through a Github Gist, the informative vignette titled “Update from openxlsx”, and my personal hands-on experience.
The provided tables are designed to offer a clear overview of the differences between openxlsx and openxlsx2. They serve as a concise presentation, illustrating the content of the vignette titled “Update from openxlsx,” which can be explored in detail here.
Chaining versus Piping
In contrast to openxlsx, openxlsx2 enables the creation of workbooks, the addition of styles, data writing, and other operations through the use of chained or piped operations.
In their recent update on openxlsx2, Jordan Mark Barbone and Jan Marvin Garbuszus shared a significant development in this vignette section: the transition to R6 in openxlsx2. This shift signifies a fundamental change in the relationship between methods
and objects
within openxlsx2, where methods
are now associated with objects
rather than generics
.
Therefore, functions can now be seamlessly chained. Moreover, operation pipelines can now be used in openxlsx2 as opposed to openxlsx. This change was triggered by the restructuration of the internal framework of openxlsx.
For those seeking a deeper understanding of R6, please refer to Chapter 6 of “Advanced R” by Hadley Wickham. This chapter provides comprehensive insights into the principles and applications of R6.
In the tables below, we’ve highlighted the key arguments for each function. For a deeper dive into the specifics of each function’s arguments, check out the examples in the following sections.
Kindly explore this specific vignette section for an example that contrasts openxlsx and openxlsx2. Unlike openxlsx, which lacks integration with pipes or chains, openxlsx2 offers flexibility in its application.
In the forthcoming examples, I will employ pipes and chains to illustrate diverse applications of the openxlsx2 functions. Feel free to use your preferred method!
Read an existing Excel Workbook
Please be aware that the recently introduced functions in openxlsx2, namely read_xlsx()
and wb_read()
, serve as mere wrappers for wb_to_df()
. Their purpose is to facilitate a seamless transition between openxlsx and openxlsx2.
Example
In this example, I want to illustrate how to read an Excel file efficiently.
You can use wb_to_df()
, wb_read()
or read_xlsx()
which almost have all arguments in common, since wb_read()
and read_xlsx()
are wrappers of wb_to_df()
.
When you read an Excel file, you can define the types of the columns directly. Please refer to the Details section for more information. Please keep in mind that the type
argument must be a named numeric where each variables belongs to one of these instances:
0: character
1: numeric
2: date
3: posixt (datetime)
4: logical
<- system.file("extdata", "openxlsx2_example.xlsx", package = "openxlsx2")
file
# With wb_load
<- wb_load(file)
my_wb_loaded my_wb_loaded
A Workbook object.
Worksheets:
Sheets: Sheet1, Sheet2
Write order: 1, 2
wb_open(my_wb_loaded) # take a look at the raw xlsx file
<- wb_to_df(my_wb_loaded,
my_df_loaded sheet = 1,
skip_empty_cols = TRUE,
skip_empty_rows = TRUE,
show_formula = TRUE,
na.strings = c("#NUM!", "#VALUE!"),
types = c(
"Var1" = 4, "Var2" = 1, "Var3" = 1, "Var4" = 0, "Var5" = 2,
"Var6" = 0, "Var7" = 0, "Var8" = 3
)
) my_df_loaded
Var1 Var2 Var3 Var4 Var5 Var6 Var7
3 TRUE 1 1.00 a 2023-05-29 3209324 This E3/0
4 TRUE NA NA b 2023-05-23 <NA> C4
5 TRUE 2 1.34 c 2023-02-01 <NA> <NA>
6 FALSE 2 NA <NA> <NA> <NA> C6+E6
7 FALSE 3 1.56 e <NA> <NA> <NA>
8 FALSE 1 1.70 f 2023-03-02 <NA> C8+E8
10 FALSE 2 23.00 h 2023-12-24 <NA> SUM(C10,E10)
11 FALSE 3 67.30 i 2023-12-25 <NA> PRODUCT(C11,E3)
12 NA 1 123.00 <NA> 2023-07-31 <NA> E12-C12
Var8
3 1899-12-31 01:27:15
4 1899-12-31 14:02:57
5 1899-12-31 23:01:02
6 1899-12-31 17:24:53
7 <NA>
8 1899-12-31 08:45:58
10 <NA>
11 <NA>
12 <NA>
# With read_xlsx
<- read_xlsx(file,
my_xslx_read convert = FALSE # to keep all variables as characters
)glimpse(my_xslx_read)
Rows: 10
Columns: 9
$ Var1 <chr> "TRUE", "TRUE", "TRUE", "FALSE", "FALSE", "FALSE", NA, "FALSE", "…
$ Var2 <chr> "1", NA, "2", "2", "3", "1", NA, "2", "3", "1"
$ NA <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
$ Var3 <chr> "1", "#NUM!", "1.34", NA, "1.56", "1.7", NA, "23", "67.3", "123"
$ Var4 <chr> "a", "b", "c", "#NUM!", "e", "f", NA, "h", "i", NA
$ Var5 <chr> "2023-05-29", "2023-05-23", "2023-02-01", NA, NA, "2023-03-02", N…
$ Var6 <chr> "3209324 This", NA, NA, NA, NA, NA, NA, NA, NA, NA
$ Var7 <chr> "#DIV/0!", "0", "#VALUE!", "2", NA, "2.7", NA, "25", "3", "122"
$ Var8 <chr> "6.0590277777777778E-2", "0.58538194444444447", "0.95905092592592…
# With wb_read
<- wb_read(file,
my_wb_read detect_dates = TRUE
)glimpse(my_wb_read)
Rows: 10
Columns: 9
$ Var1 <lgl> TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, NA, FALSE, FALSE, NA
$ Var2 <dbl> 1, NA, 2, 2, 3, 1, NA, 2, 3, 1
$ NA <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
$ Var3 <chr> "1", "#NUM!", "1.34", NA, "1.56", "1.7", NA, "23", "67.3", "123"
$ Var4 <chr> "a", "b", "c", "#NUM!", "e", "f", NA, "h", "i", NA
$ Var5 <date> 2023-05-29, 2023-05-23, 2023-02-01, NA, NA, 2023-03-02, NA, 2023-…
$ Var6 <chr> "3209324 This", NA, NA, NA, NA, NA, NA, NA, NA, NA
$ Var7 <chr> "#DIV/0!", "0", "#VALUE!", "2", NA, "2.7", NA, "25", "3", "122"
$ Var8 <time> 01:27:15, 14:02:57, 23:01:02, 17:24:53, NA, 08:45:58, …
Note that all returned objects have the class data.frame
.
class(my_df_loaded)
[1] "data.frame"
class(my_xslx_read)
[1] "data.frame"
class(my_wb_read)
[1] "data.frame"
Write a File
Action | openxlsx | chain | pipe |
---|---|---|---|
Create temporary xlsm file | `temp_xlsx(macros = TRUE)` | `temp_xlsx(macros = TRUE)` | |
Create temporary xlsx file | `temp_xlsx()` | `temp_xlsx()` | `temp_xlsx()` |
Write file | `write.xlsx()` | `write_xlsx()` | `write_xlsx()` |
Example
Let’s say we want to put the information about palmerpenguins::penguins
in a new Excel file.
Then, we would like to include palmerpenguins::penguins_raw
in the same Excel file.
Eventually, we want an xlsm file.
# Write a data frame into an xlsx file
<- temp_xlsx()
my_tempfile basename(my_tempfile)
[1] "temp_xlsx_10197b87fb55.xlsx"
write_xlsx(penguins, my_tempfile, col_names = TRUE)
read_xlsx(my_tempfile) |>
as_tibble()
# A tibble: 344 × 8
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 Adelie Torgersen 39.1 18.7 181 3750
2 Adelie Torgersen 39.5 17.4 186 3800
3 Adelie Torgersen 40.3 18 195 3250
4 Adelie Torgersen NA NA NA NA
5 Adelie Torgersen 36.7 19.3 193 3450
6 Adelie Torgersen 39.3 20.6 190 3650
7 Adelie Torgersen 38.9 17.8 181 3625
8 Adelie Torgersen 39.2 19.6 195 4675
9 Adelie Torgersen 34.1 18.1 193 3475
10 Adelie Torgersen 42 20.2 190 4250
# ℹ 334 more rows
# ℹ 2 more variables: sex <chr>, year <dbl>
# Write a list of data frames into an xlsx file
<- temp_xlsx()
my_2nd_tempfile basename(my_2nd_tempfile)
[1] "temp_xlsx_101951d88fdb.xlsx"
write_xlsx(
list(
"penguins" = penguins,
"penguins_raw" = penguins_raw
),
my_2nd_tempfile,start_col = c(1L, 2L),
start_row = c(2L, 3L),
col_names = TRUE
)browseURL(my_2nd_tempfile) # open in Excel
read_xlsx(my_2nd_tempfile, sheet = "penguins") |>
as_tibble()
# A tibble: 344 × 8
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 Adelie Torgersen 39.1 18.7 181 3750
2 Adelie Torgersen 39.5 17.4 186 3800
3 Adelie Torgersen 40.3 18 195 3250
4 Adelie Torgersen NA NA NA NA
5 Adelie Torgersen 36.7 19.3 193 3450
6 Adelie Torgersen 39.3 20.6 190 3650
7 Adelie Torgersen 38.9 17.8 181 3625
8 Adelie Torgersen 39.2 19.6 195 4675
9 Adelie Torgersen 34.1 18.1 193 3475
10 Adelie Torgersen 42 20.2 190 4250
# ℹ 334 more rows
# ℹ 2 more variables: sex <chr>, year <dbl>
browseURL(my_2nd_tempfile) # open in Excel
read_xlsx(my_2nd_tempfile, sheet = "penguins_raw") |>
as_tibble()
# A tibble: 344 × 17
studyName `Sample Number` Species Region Island Stage `Individual ID`
<chr> <dbl> <chr> <chr> <chr> <chr> <chr>
1 PAL0708 1 Adelie Penguin… Anvers Torge… Adul… N1A1
2 PAL0708 2 Adelie Penguin… Anvers Torge… Adul… N1A2
3 PAL0708 3 Adelie Penguin… Anvers Torge… Adul… N2A1
4 PAL0708 4 Adelie Penguin… Anvers Torge… Adul… N2A2
5 PAL0708 5 Adelie Penguin… Anvers Torge… Adul… N3A1
6 PAL0708 6 Adelie Penguin… Anvers Torge… Adul… N3A2
7 PAL0708 7 Adelie Penguin… Anvers Torge… Adul… N4A1
8 PAL0708 8 Adelie Penguin… Anvers Torge… Adul… N4A2
9 PAL0708 9 Adelie Penguin… Anvers Torge… Adul… N5A1
10 PAL0708 10 Adelie Penguin… Anvers Torge… Adul… N5A2
# ℹ 334 more rows
# ℹ 10 more variables: `Clutch Completion` <chr>, `Date Egg` <date>,
# `Culmen Length (mm)` <dbl>, `Culmen Depth (mm)` <dbl>,
# `Flipper Length (mm)` <dbl>, `Body Mass (g)` <dbl>, Sex <chr>,
# `Delta 15 N (o/oo)` <dbl>, `Delta 13 C (o/oo)` <dbl>, Comments <chr>
# Write a data frame into an xlsm file
<- temp_xlsx(macros = TRUE)
my_3rd_tempfile basename(my_3rd_tempfile)
[1] "temp_xlsx_10193f5470e0.xlsm"
write_xlsx(penguins, my_3rd_tempfile, col_names = TRUE)
read_xlsx(my_3rd_tempfile) |>
as_tibble()
# A tibble: 344 × 8
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 Adelie Torgersen 39.1 18.7 181 3750
2 Adelie Torgersen 39.5 17.4 186 3800
3 Adelie Torgersen 40.3 18 195 3250
4 Adelie Torgersen NA NA NA NA
5 Adelie Torgersen 36.7 19.3 193 3450
6 Adelie Torgersen 39.3 20.6 190 3650
7 Adelie Torgersen 38.9 17.8 181 3625
8 Adelie Torgersen 39.2 19.6 195 4675
9 Adelie Torgersen 34.1 18.1 193 3475
10 Adelie Torgersen 42 20.2 190 4250
# ℹ 334 more rows
# ℹ 2 more variables: sex <chr>, year <dbl>
Parse XML Files
Aside from the transition to R6, another significant improvement in openxlsx2, compared to openxlsx, is the inclusion of pugixml to reliably parse XML files.
# source: https://www.geeksforgeeks.org/working-with-xml-files-in-r-programming/
<- file.path(
path_xml ::here(),
here"posts", "2023-11-11_openxlsx2", "records.xml"
)
# Read an xml file
<- read_xml(path_xml)
my_xml_file my_xml_file
<RECORDS>
<STUDENT>
<ID>1</ID>
<NAME>Alia</NAME>
<MARKS>620</MARKS>
<BRANCH>IT</BRANCH>
</STUDENT>
<STUDENT>
<ID>2</ID>
<NAME>Brijesh</NAME>
<MARKS>440</MARKS>
<BRANCH>Commerce</BRANCH>
</STUDENT>
<STUDENT>
<ID>3</ID>
<NAME>Yash</NAME>
<MARKS>600</MARKS>
<BRANCH>Humanities</BRANCH>
</STUDENT>
<STUDENT>
<ID>4</ID>
<NAME>Mallika</NAME>
<MARKS>660</MARKS>
<BRANCH>IT</BRANCH>
</STUDENT>
<STUDENT>
<ID>5</ID>
<NAME>Zayn</NAME>
<MARKS>560</MARKS>
<BRANCH>IT</BRANCH>
</STUDENT>
</RECORDS>
# Return node
<- xml_node(my_xml_file, level1 = "RECORDS")
node_records node_records
[1] "<RECORDS><STUDENT><ID>1</ID><NAME>Alia</NAME><MARKS>620</MARKS><BRANCH>IT</BRANCH></STUDENT><STUDENT><ID>2</ID><NAME>Brijesh</NAME><MARKS>440</MARKS><BRANCH>Commerce</BRANCH></STUDENT><STUDENT><ID>3</ID><NAME>Yash</NAME><MARKS>600</MARKS><BRANCH>Humanities</BRANCH></STUDENT><STUDENT><ID>4</ID><NAME>Mallika</NAME><MARKS>660</MARKS><BRANCH>IT</BRANCH></STUDENT><STUDENT><ID>5</ID><NAME>Zayn</NAME><MARKS>560</MARKS><BRANCH>IT</BRANCH></STUDENT></RECORDS>"
<- xml_node(my_xml_file, level1 = "RECORDS", level2 = "STUDENT")
node_student node_student
[1] "<STUDENT><ID>1</ID><NAME>Alia</NAME><MARKS>620</MARKS><BRANCH>IT</BRANCH></STUDENT>"
[2] "<STUDENT><ID>2</ID><NAME>Brijesh</NAME><MARKS>440</MARKS><BRANCH>Commerce</BRANCH></STUDENT>"
[3] "<STUDENT><ID>3</ID><NAME>Yash</NAME><MARKS>600</MARKS><BRANCH>Humanities</BRANCH></STUDENT>"
[4] "<STUDENT><ID>4</ID><NAME>Mallika</NAME><MARKS>660</MARKS><BRANCH>IT</BRANCH></STUDENT>"
[5] "<STUDENT><ID>5</ID><NAME>Zayn</NAME><MARKS>560</MARKS><BRANCH>IT</BRANCH></STUDENT>"
<- xml_node(my_xml_file,
node_name level1 = "RECORDS", level2 = "STUDENT",
level3 = "NAME"
) node_name
[1] "<NAME>Alia</NAME>" "<NAME>Brijesh</NAME>" "<NAME>Yash</NAME>"
[4] "<NAME>Mallika</NAME>" "<NAME>Zayn</NAME>"
<- xml_node(my_xml_file,
node_branch level1 = "RECORDS", level2 = "STUDENT",
level3 = "BRANCH"
) node_branch
[1] "<BRANCH>IT</BRANCH>" "<BRANCH>Commerce</BRANCH>"
[3] "<BRANCH>Humanities</BRANCH>" "<BRANCH>IT</BRANCH>"
[5] "<BRANCH>IT</BRANCH>"
# Load character string to pugixml and returns an external pointer
as_xml(node_student)
<STUDENT>
<ID>1</ID>
<NAME>Alia</NAME>
<MARKS>620</MARKS>
<BRANCH>IT</BRANCH>
</STUDENT>
<STUDENT>
<ID>2</ID>
<NAME>Brijesh</NAME>
<MARKS>440</MARKS>
<BRANCH>Commerce</BRANCH>
</STUDENT>
<STUDENT>
<ID>3</ID>
<NAME>Yash</NAME>
<MARKS>600</MARKS>
<BRANCH>Humanities</BRANCH>
</STUDENT>
<STUDENT>
<ID>4</ID>
<NAME>Mallika</NAME>
<MARKS>660</MARKS>
<BRANCH>IT</BRANCH>
</STUDENT>
<STUDENT>
<ID>5</ID>
<NAME>Zayn</NAME>
<MARKS>560</MARKS>
<BRANCH>IT</BRANCH>
</STUDENT>
as_xml(node_name)
<NAME>Alia</NAME>
<NAME>Brijesh</NAME>
<NAME>Yash</NAME>
<NAME>Mallika</NAME>
<NAME>Zayn</NAME>
as_xml(node_branch)
<BRANCH>IT</BRANCH>
<BRANCH>Commerce</BRANCH>
<BRANCH>Humanities</BRANCH>
<BRANCH>IT</BRANCH>
<BRANCH>IT</BRANCH>
# Return value
xml_value(my_xml_file, level1 = "RECORDS", level2 = "STUDENT", level3 = "NAME")
[1] "Alia" "Brijesh" "Yash" "Mallika" "Zayn"
xml_value(my_xml_file, level1 = "RECORDS", level2 = "STUDENT", level3 = "BRANCH")
[1] "IT" "Commerce" "Humanities" "IT" "IT"
xml_value(my_xml_file, level1 = "RECORDS", level2 = "STUDENT", level3 = "ID")
[1] "1" "2" "3" "4" "5"
xml_value(my_xml_file, level1 = "RECORDS", level2 = "STUDENT", level3 = "MARKS")
[1] "620" "440" "600" "660" "560"
# Return attributes
xml_attr(my_xml_file, level1 = "RECORDS", level2 = "STUDENT")
[[1]]
named character(0)
[[2]]
named character(0)
[[3]]
named character(0)
[[4]]
named character(0)
[[5]]
named character(0)
xml_attr(my_xml_file, level1 = "RECORDS")
[[1]]
named character(0)
Create an Excel Workbook from Scratch
Let’s say you would like to create a workbook called wb
and a worksheet called ws
where you want to write data x
.
Action | openxlsx | chain | pipe |
---|---|---|---|
Create workbook | `createWorkbook()` | `wb_workbook()` | `wb_workbook()` |
Add worksheet | `addWorksheet(wb, 'ws')` | `wb$add_worksheet('ws')` | `wb_add_worksheet('ws')` |
Add data to worksheet | `writeData(wb, 'ws', x)` | `wb$add_data(x)` | `wb_add_data(x)` |
Inspect workbook | `openXL(wb)` | `wb$open()` | `wb_open()` |
Save workbook | `saveWorkbook(wb, file) | `wb$save(file)` | `wb_save(file)` |
Example
In this example, our aim is to explore various options for creating, saving, and importing an Excel workbook.
Create a Workbook with Multiple Sheets
Chained
<- 3L
start_col_ws1 <- 10L
start_row_ws1
<- wb_workbook(
wb creator = "My Name",
title = "My Test",
subject = "Create a Workbook",
category = "Test",
datetime_created = Sys.time(),
theme = "Office Theme",
keywords = "test",
comments = "This is my test",
manager = "Me",
company = "My Company"
$
)add_worksheet(
sheet = "my_first_sheet",
tab_color = "blue",
zoom = 150L,
visible = TRUE,
orientation = "portrait",
header = c("ODD HEAD LEFT", "ODD HEAD CENTER", "ODD HEAD RIGHT"),
footer = c("ODD FOOT RIGHT", "ODD FOOT CENTER", "ODD FOOT RIGHT"),
even_header = c("EVEN HEAD LEFT", "EVEN HEAD CENTER", "EVEN HEAD RIGHT"),
even_footer = c("EVEN FOOT RIGHT", "EVEN FOOT CENTER", "EVEN FOOT RIGHT"),
first_header = c("TOP", "OF FIRST", "PAGE"),
first_footer = c("BOTTOM", "OF FIRST", "PAGE")
$
)add_data(
x = penguins,
start_col = start_col_ws1,
start_row = start_row_ws1,
array = FALSE,
col_names = TRUE,
row_names = FALSE,
with_filter = FALSE
$
)add_worksheet(
sheet = "my_second_sheet",
tab_color = "red",
zoom = 75L,
visible = TRUE,
orientation = "landscape"
$
)add_data(
x = penguins_raw,
dims = wb_dims(2, 1),
col_names = TRUE,
row_names = TRUE,
with_filter = TRUE
)
Piped
<- 3L
start_col_ws1 <- 10L
start_row_ws1
<- wb_workbook(
wb creator = "My Name",
title = "My Test",
subject = "Create a Workbook",
category = "Test",
datetime_created = Sys.time(),
theme = "Office Theme",
keywords = "test",
comments = "This is my test",
manager = "Me",
company = "My Company"
|>
) wb_add_worksheet(
sheet = "my_first_sheet",
tab_color = "blue",
zoom = 150L,
visible = TRUE,
orientation = "portrait",
header = c("ODD HEAD LEFT", "ODD HEAD CENTER", "ODD HEAD RIGHT"),
footer = c("ODD FOOT RIGHT", "ODD FOOT CENTER", "ODD FOOT RIGHT"),
even_header = c("EVEN HEAD LEFT", "EVEN HEAD CENTER", "EVEN HEAD RIGHT"),
even_footer = c("EVEN FOOT RIGHT", "EVEN FOOT CENTER", "EVEN FOOT RIGHT"),
first_header = c("TOP", "OF FIRST", "PAGE"),
first_footer = c("BOTTOM", "OF FIRST", "PAGE")
|>
) wb_add_data(
x = penguins,
start_col = start_col_ws1,
start_row = start_row_ws1,
array = FALSE,
col_names = TRUE,
row_names = FALSE,
with_filter = FALSE
|>
) wb_add_worksheet(
sheet = "my_second_sheet",
tab_color = "red",
zoom = 75L,
visible = TRUE,
orientation = "landscape"
|>
) wb_add_data(
x = penguins_raw,
dims = wb_dims(2, 1),
col_names = TRUE,
row_names = TRUE,
with_filter = TRUE
)
Inspect and Write the Workbook Data
You can perform the following steps to examine the data and save it to a temporary xlsx file.
Chained
# Inspect the workbook
$open()
wb
# Create a new temporary file
<- temp_xlsx()
myfile
# Write the workbook into a temporary xlsx file
$save(file = myfile, overwrite = TRUE)
wb
# Open the xlsx file
browseURL(myfile)
Piped
# Inspect the workbook
|>
wb wb_open()
# Create a new temporary file
<- temp_xlsx()
myfile
# Write the workbook into a temporary xlsx file
|>
wb wb_save(file = myfile, overwrite = TRUE)
# Open the xlsx file
browseURL(myfile)
Please be aware that in order to view the headers and footers, you should click on “View” and select “Header and Footer” (for Mac users). This option enables you to distinguish between the landscape and portrait orientations on the two sheets.
Import a Workbook and Start Working on the Data with dplyr
Here, no operations are available for chaining.
# Import the data of your file into a data frame
<- myfile |>
my_df wb_to_df(
sheet = 1,
detect_dates = TRUE,
col_names = TRUE
)
# Display the first few rows of the data frame
|>
my_df head()
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
11 Adelie Torgersen 39.1 18.7 181 3750
12 Adelie Torgersen 39.5 17.4 186 3800
13 Adelie Torgersen 40.3 18.0 195 3250
14 Adelie Torgersen NA NA NA NA
15 Adelie Torgersen 36.7 19.3 193 3450
16 Adelie Torgersen 39.3 20.6 190 3650
sex year
11 male 2007
12 female 2007
13 female 2007
14 <NA> 2007
15 female 2007
16 male 2007
# Provide a summary of the data frame structure
|>
my_df glimpse()
Rows: 344
Columns: 8
$ species <chr> "Adelie", "Adelie", "Adelie", "Adelie", "Adelie", "A…
$ island <chr> "Torgersen", "Torgersen", "Torgersen", "Torgersen", …
$ bill_length_mm <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, …
$ bill_depth_mm <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, …
$ flipper_length_mm <dbl> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186…
$ body_mass_g <dbl> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, …
$ sex <chr> "male", "female", "female", NA, "female", "male", "f…
$ year <dbl> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…
# Provide the class of the data frame
class(my_df)
[1] "data.frame"
# Convert it into a tibble
<- as_tibble(my_df)
my_tib my_tib
# A tibble: 344 × 8
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 Adelie Torgersen 39.1 18.7 181 3750
2 Adelie Torgersen 39.5 17.4 186 3800
3 Adelie Torgersen 40.3 18 195 3250
4 Adelie Torgersen NA NA NA NA
5 Adelie Torgersen 36.7 19.3 193 3450
6 Adelie Torgersen 39.3 20.6 190 3650
7 Adelie Torgersen 38.9 17.8 181 3625
8 Adelie Torgersen 39.2 19.6 195 4675
9 Adelie Torgersen 34.1 18.1 193 3475
10 Adelie Torgersen 42 20.2 190 4250
# ℹ 334 more rows
# ℹ 2 more variables: sex <chr>, year <dbl>
Add a Drop-Down List to Your Workbook
Let’s assume you have a workbook called wb
.
To add a drop-down list, you must first add the data to a separate sheet1. We also want to keep it hidden, so we set the argument visible
to "hidden"
. You can then create a drop-down list with wb_add_data_validation()
.
It is also possible to apply various styles created with wb_add_dxfs_style()
2 to the values of your drop-down list with wb_add_conditional_formatting()
.
Action | openxlsx | chain | pipe |
---|---|---|---|
Add a drop-down list | `dataValidation()` | `wb$add_data_validation()` | `wb_add_data_validation()` |
Create a dxfs styling for the workbook | `createStyle()` | `wb$add_dxfs_style()` | `wb_add_dxfs_style()` |
Add conditional formatting | `conditionalFormatting()` | `wb$add_conditional_formatting()` | `wb_add_conditional_formatting()` |
Example
Chained
<- wb$
wb_ddl add_worksheet("drop-down_list", visible = "hidden")$
add_data(
x = c("huge", "big", "normal", "small", "tiny"),
start_col = 1L,
start_row = 1L
$
)add_data(
sheet = "my_first_sheet",
x = "size",
start_col = which(names(penguins) == last(names(penguins))) + start_col_ws1,
start_row = start_row_ws1
$
)add_data_validation(
sheet = "my_first_sheet",
dims = wb_dims(
x = penguins,
cols = which(names(penguins) == last(names(penguins))) + start_col_ws1,
from_row = start_row_ws1
),operator = "equal",
type = "list",
value = "'drop-down_list'!$A$1:$A$5"
)
Piped
<- wb |>
wb_ddl wb_add_worksheet("drop-down_list", visible = "hidden") |>
wb_add_data(
x = c("huge", "big", "normal", "small", "tiny"),
start_col = 1L,
start_row = 1L
|>
) wb_add_data(
sheet = "my_first_sheet",
x = "size",
start_col = which(names(penguins) == last(names(penguins))) + start_col_ws1,
start_row = start_row_ws1
|>
) wb_add_data_validation(
sheet = "my_first_sheet",
dims = wb_dims(
x = penguins,
cols = which(names(penguins) == last(names(penguins))) + start_col_ws1,
from_row = start_row_ws1
),operator = "equal",
type = "list",
value = "'drop-down_list'!$A$1:$A$5"
)
Add Colors to Your Drop-Down Values
In order to make it look better, we can format each of the values of the drop-down list.
Chained
<- wb_dims(
dims x = penguins,
cols = which(names(penguins) == last(names(penguins))) + start_col_ws1,
from_row = start_row_ws1
)
<- wb_ddl$
wb_cf add_dxfs_style(
name = "style_huge",
bg_fill = wb_color("#AAAAAA")
$
)add_dxfs_style(
name = "style_big",
bg_fill = wb_color("#6FA8DC")
$
)add_dxfs_style(
name = "style_normal",
bg_fill = wb_color("#00AA00")
$
)add_dxfs_style(
name = "style_small",
bg_fill = wb_color("#CCCC00")
$
)add_dxfs_style(
name = "style_tiny",
bg_fill = wb_color("#CC0000"),
font_color = wb_color("#EEEEEE")
$
)add_conditional_formatting(
dims = dims,
type = "containsText",
rule = "huge",
style = "style_huge"
$
)add_conditional_formatting(
dims = dims,
type = "containsText",
rule = "big",
style = "style_big"
$
)add_conditional_formatting(
dims = dims,
type = "containsText",
rule = "normal",
style = "style_normal"
$
)add_conditional_formatting(
dims = dims,
type = "containsText",
rule = "small",
style = "style_small"
$
)add_conditional_formatting(
dims = dims,
type = "containsText",
rule = "tiny",
style = "style_tiny"
)
Piped
<- wb_dims(
dims x = penguins,
cols = which(names(penguins) == last(names(penguins))) + start_col_ws1,
from_row = start_row_ws1
)
<- wb_ddl |>
wb_cf wb_add_dxfs_style(
name = "style_huge",
bg_fill = wb_color("#AAAAAA")
|>
) wb_add_dxfs_style(
name = "style_big",
bg_fill = wb_color("#6FA8DC")
|>
) wb_add_dxfs_style(
name = "style_normal",
bg_fill = wb_color("#00AA00")
|>
) wb_add_dxfs_style(
name = "style_small",
bg_fill = wb_color("#CCCC00")
|>
) wb_add_dxfs_style(
name = "style_tiny",
bg_fill = wb_color("#CC0000"),
font_color = wb_color("#EEEEEE")
|>
) wb_add_conditional_formatting(
dims = dims,
type = "containsText",
rule = "huge",
style = "style_huge"
|>
) wb_add_conditional_formatting(
dims = dims,
type = "containsText",
rule = "big",
style = "style_big"
|>
) wb_add_conditional_formatting(
dims = dims,
type = "containsText",
rule = "normal",
style = "style_normal"
|>
) wb_add_conditional_formatting(
dims = dims,
type = "containsText",
rule = "small",
style = "style_small"
|>
) wb_add_conditional_formatting(
dims = dims,
type = "containsText",
rule = "tiny",
style = "style_tiny"
)
Filter Data, Protect Worksheet and Freeze your a Worksheet Pane
Let’s say you have a workbook called wb
and you would like to add some filters, to protect a worksheet and to freeze data.
Action | openxlsx | chain | pipe |
---|---|---|---|
Filter data | `addFilter()` | `wb$add_filter()` | `wb_add_filter()` |
Protect worksheet | `protectWorksheet()` | `wb$protect_worksheet()` | `wb_protect_worksheet()` |
Freeze data | `freezePane()` | `wb$freeze_pane()` | `wb_freeze_pane()` |
Example
It’s important to note that when you use wb_add_filter()
, every column in between the specified ones will also inherit a filter. So, if you don’t want a filter to be added to certain columns, you’ll need to rearrange your dataset accordingly to achieve the desired outcome.
In the following example, we will make the following improvements:
Add a filter to the two first columns, namely
species
andisland
Set the first active row to the one that contains data, just below the column names, and the first active column to to the
island
column (the second one)Protect the second worksheet called “my_second_sheet”
Chained
<- wb_cf$
wb_pff add_filter(
sheet = "my_first_sheet",
rows = start_row_ws1,
cols = start_col_ws1 - 1 + c(
which(names(penguins) == "species"),
which(names(penguins) == "island")
)$
)freeze_pane(
first_active_row = start_row_ws1 + 1,
first_active_col = start_col_ws1 + 1
$
)protect_worksheet(
sheet = "my_second_sheet"
$
)protect_worksheet(
sheet = "my_first_sheet",
protect = TRUE,
# Formatting cells / columns is allowed,
# but inserting / deleting columns is protected:
c(
formatCells = FALSE, formatColumns = FALSE,
insertColumns = TRUE, deleteColumns = TRUE
) )
Piped
<- wb_cf |>
wb_pff wb_add_filter(
sheet = "my_first_sheet",
rows = start_row_ws1,
cols = start_col_ws1 - 1 + c(
which(names(penguins) == "species"),
which(names(penguins) == "island")
)|>
) wb_freeze_pane(
first_active_row = start_row_ws1 + 1,
first_active_col = start_col_ws1 + 1
|>
) wb_protect_worksheet(
sheet = "my_second_sheet"
|>
) wb_protect_worksheet(
sheet = "my_first_sheet",
protect = TRUE,
# Formatting cells / columns is allowed,
# but inserting / deleting columns is protected:
c(
formatCells = FALSE, formatColumns = FALSE,
insertColumns = TRUE, deleteColumns = TRUE
) )
Make Your Data Shine
Let’s say you have a workbook called wb
and you would like to some styling.
The following functions will help you format your data as you wish in your worksheet.
Action | openxlsx | chain | pipe |
---|---|---|---|
Add font | `addStyle()`, `createStyle()` | `wb$add_font()` | `wb_add_font()` |
Add cell style | `addStyle()`, `createStyle()` | `wb$add_cell_style()` | `wb_add_cell_style()` |
Add background fill color | `addStyle()`, `createStyle()` | `wb$add_fill()` | `wb_add_fill()` |
Add number formatting | `addStyle()`, `createStyle()` | `wb$add_numfmt()` | `wb_add_numfmt()` |
Set column width | `setColWidths()` | `wb$set_col_widths()` | `wb_set_col_widths()` |
Add border | `addStyle()`, `createStyle()` | `wb$add_border()` | `wb_add_border()` |
These helpers could be used to apply these formatting functions:
wb_dims()
: Helper to specify the dims argumentint2col()
: Convert integer to Excel column
Example
Chained
<- wb_dims(
ws1_colnames rows = start_row_ws1,
cols = seq_len(
ncol(penguins) + 1
+
) - 1
start_col_ws1
)
<- wb_dims(
ws1_data x = penguins,
from_row = start_row_ws1,
rows = seq_len(nrow(penguins)),
from_col = start_col_ws1,
cols = seq_len(ncol(penguins) + 1)
)
<- wb_dims(
col_bmg x = penguins,
from_row = start_row_ws1,
rows = seq_len(nrow(penguins)),
cols = which(names(penguins) == "body_mass_g") +
- 1
start_col_ws1
)
<- wb_dims(
col_size x = penguins,
cols = which(names(penguins) == last(names(penguins))) + start_col_ws1,
from_row = start_row_ws1
)
<- wb_pff$
wb_format add_font(
sheet = 1,
dims = ws1_colnames, # add font to the original penguins variables
size = 12,
bold = TRUE
$
)add_border(
sheet = 1,
dims = ws1_data,
bottom_border = "thin",
left_border = "thin",
right_border = "thin",
top_border = NULL
$
)add_border(
sheet = 1,
dims = ws1_colnames,
bottom_border = "thick",
left_border = "thick",
right_border = "thick",
top_border = "thick"
$
)add_cell_style(
dims = ws1_data,
vertical = "top",
horizontal = "left",
wrap_text = TRUE,
locked = FALSE
$
)add_cell_style(
dims = ws1_colnames,
vertical = "center",
horizontal = "center",
wrap_text = TRUE
$
)set_col_widths(
cols = int2col(
seq_len(ncol(penguins) + 1)
),widths = 22
$
)add_numfmt(
dims = col_bmg,
numfmt = "#.0"
$
)add_fill(
dims = col_size,
color = wb_color("#d9d2e9")
)
Piped
<- wb_dims(
ws1_colnames rows = start_row_ws1,
cols = seq_len(
ncol(penguins) + 1
+
) - 1
start_col_ws1
)
<- wb_dims(
ws1_data x = penguins,
from_row = start_row_ws1,
rows = seq_len(nrow(penguins)),
from_col = start_col_ws1,
cols = seq_len(ncol(penguins) + 1)
)
<- wb_dims(
col_bmg x = penguins,
from_row = start_row_ws1,
rows = seq_len(nrow(penguins)),
cols = which(names(penguins) == "body_mass_g") +
- 1
start_col_ws1
)
<- wb_dims(
col_size x = penguins,
cols = which(names(penguins) == last(names(penguins))) + start_col_ws1,
from_row = start_row_ws1
)
<- wb_pff |>
wb_format wb_add_font(
sheet = 1,
dims = ws1_colnames, # add font to the original penguins variables
size = 12,
bold = TRUE
|>
) wb_add_border(
sheet = 1,
dims = ws1_data,
bottom_border = "thin",
left_border = "thin",
right_border = "thin",
top_border = NULL
|>
) wb_add_border(
sheet = 1,
dims = ws1_colnames,
bottom_border = "thick",
left_border = "thick",
right_border = "thick",
top_border = "thick"
|>
) wb_add_cell_style(
dims = ws1_data,
vertical = "top",
horizontal = "left",
wrap_text = TRUE,
locked = FALSE
|>
) wb_add_cell_style(
dims = ws1_colnames,
vertical = "center",
horizontal = "center",
wrap_text = TRUE
|>
) wb_set_col_widths(
cols = int2col(
seq_len(ncol(penguins) + 1)
),widths = 22
|>
) wb_add_numfmt(
dims = col_bmg,
numfmt = "#.0"
|>
) wb_add_fill(
dims = col_size,
color = wb_color("#d9d2e9")
)
Output
You can download the final Excel output here.
Acknowledgements
Thank you for reading. Enjoy using openxlsx2!
Many thanks to the openxlsx2 developers:
References
Footnotes
Please refer to my older blog post on openxlsx.demo.↩︎
The function
wb_add_dxfs_style()
creates a dxfs styling for the workbook. The element<dxfs>
defines a specific formatting. Please refer to c-rex.net for more information.↩︎
Citation
@online{lettry2023,
author = {Lettry, Layal Christine},
title = {Switch from Openxlsx to Openxlsx2},
date = {2023-11-15},
url = {https://rdiscovery.netlify.app/posts/2023-11-11_openxlsx2/},
langid = {en}
}