How to replace some non-available values in a vector with values coming from another vector?

dplyr
coalesce
if_else
is.na
Author
Affiliations

Layal Christine Lettry

cynkra GmbH

University of Fribourg, Dept. of Informatics, ASAM Group

Published

May 10, 2024

How to replace some non-available values in a vector with values coming from another vector?

Until recently, I used to write the following code to fix NA values.

my_tib <- tibble::tribble(
  ~var_with_na, ~var_non_na,
  NA_real_, 1.4,
  5.4, 5.0,
  NA_real_, 9.4,
  13.4, 13.0,
  NA_real_, 17.4
)

my_tib |>
  dplyr::mutate(
    my_fixed_var = dplyr::if_else(
      is.na(var_with_na), var_non_na, var_with_na
    )
  )
# A tibble: 5 × 3
  var_with_na var_non_na my_fixed_var
        <dbl>      <dbl>        <dbl>
1        NA          1.4          1.4
2         5.4        5            5.4
3        NA          9.4          9.4
4        13.4       13           13.4
5        NA         17.4         17.4

A more efficient way to do this is to use the function coalesce() from the dplyr package. This will allow you to find the first non-missing element in a set of vectors.

my_tib |>
  dplyr::mutate(
    my_fixed_var = dplyr::coalesce(var_with_na, var_non_na)
  )
# A tibble: 5 × 3
  var_with_na var_non_na my_fixed_var
        <dbl>      <dbl>        <dbl>
1        NA          1.4          1.4
2         5.4        5            5.4
3        NA          9.4          9.4
4        13.4       13           13.4
5        NA         17.4         17.4

This function takes all the available values from the vector you set in the first argument and replaces its non-available values with the first non-missing values from the vector in the second argument.

You could also do this with more than two vectors.

my_tib_2 <- tibble::tribble(
  ~var_with_na_1, ~var_with_na_2, ~var_with_na_3,
  NA_real_, 1.2, 1.4,
  5.4, 5.2, NA_real_,
  NA_real_, NA_real_, NA_real_,
  13.4, NA_real_, 13.0,
  NA_real_, NA_real_, 17.4
)

my_tib_2 |>
  dplyr::mutate(
    my_fixed_var =
      dplyr::coalesce(var_with_na_1, var_with_na_2, var_with_na_3)
  )
# A tibble: 5 × 4
  var_with_na_1 var_with_na_2 var_with_na_3 my_fixed_var
          <dbl>         <dbl>         <dbl>        <dbl>
1          NA             1.2           1.4          1.2
2           5.4           5.2          NA            5.4
3          NA            NA            NA           NA  
4          13.4          NA            13           13.4
5          NA            NA            17.4         17.4

The sequence of the vectors specified as arguments in the coalesce function determines the order in which the NA values of the initial vectors will be replaced with the values of the remaining ones. Remember that the first non-missing element in a set of vectors will be taken as a replacement value for the NA values in the first vectors. If all values are not available, then the result will also be.

Citation

BibTeX citation:
@online{lettry2024,
  author = {Lettry, Layal Christine},
  title = {How to Replace Some Non-Available Values in a Vector with
    Values Coming from Another Vector?},
  date = {2024-05-10},
  url = {https://rdiscovery.netlify.app/posts/2024-05-10_coalesce/},
  langid = {en}
}
For attribution, please cite this work as:
Lettry, Layal Christine. 2024. “How to Replace Some Non-Available Values in a Vector with Values Coming from Another Vector?” May 10, 2024. https://rdiscovery.netlify.app/posts/2024-05-10_coalesce/.