I need to filter values according to selected years, without modifying the initial format of the tibble. Thanks to @moodymudskipper’s help, I found a way of solving this problem.
So I am writing this blog post to keep this solution safe.
As the years are not side by side with the values, we cannot simply use the filter() function from the dplyr package. Instead, we need to go into the nested tibble and filter values according to the selected years.
my_tib$my_data
<arrow_list[1]>
[[1]]
# A tibble: 1 × 2
year records
<arrw_lst> <arrw_lst>
1 [5] [7 × 2]
Let’s say we want only the years 2018 and 2021. You need to use the functions map() and map2_dfr() from the purrr package in order to filter the list elements according to the vector my_years.
my_years <-c(2018L, 2021L)my_tib$my_data <- purrr::map(my_tib$my_data, ~ {# we're in a nested tibble# now iterating on observations of the cols of the nested tibble purrr::map2_dfr(.x$year, .x$records, function(year, record) { record$values <- purrr::map(record$values, \(x) x[year %in% my_years]) tibble::tibble(year = vctrs::list_of(year[year %in% my_years]),records = vctrs::list_of(record) ) }) })constructive::construct(my_tib)
We end up with the same structure as the initial tibble. Our mission is fulfilled!
Citation
BibTeX citation:
@online{lettry2024,
author = {Lettry, Layal Christine},
title = {How to Filter Values in a Nested Tibble Without Using
`Filter()` from Dplyr?},
date = {2024-05-20},
url = {https://rdiscovery.netlify.app/posts/2024-05-20_filter-nested-tibble/},
langid = {en}
}