Skip to contents

Rationale

It is common for Electronic Immunization Registries (EIR) to collect both the place of residence and the place of occurrence of the vaccination event. Vaccination coverage can be estimated by grouping events according to either location. However, denominators are typically estimated based on the place of residence.

Using either residence or occurrence as the numerator can lead to underestimation or overestimation of coverage, and sometimes result in values above 100%, especially in areas where vaccination services are more available.

The residence vs. occurrence module in the PAHOabc package provides a set of functions to explore and analyze this phenomenon.

Note

All functions used for residence and occurrence analyses within PAHOabc begin with the roc_ prefix.

Glossary

Political Geographic Boundaries

Throughout this vignette, we will make reference to different levels of political geographic boundaries. These levels are defined by the country/territory and can have different names from country to country. PAHOabc is agnostic to these names and requires the user to recode their variable names to fit the PAHOabc structure.

Namely, PAHOabc can distinguish three administrative levels, which listed from top to bottom level are: ADM0, ADM1 and ADM2.

  1. ADM0: The country. This is the top-most administrative level.
  2. ADM1: The first geographic subdivision in the country. ADM0 contains several ADM1 subdivisions.
  3. ADM2: The second geographic subdivision in the country. Each ADM1 contains several ADM2 subdivisions.

Vaccine Names

Our example datasets use a specific naming convention for the vaccine doses. This might not match your own naming schemes, but it will not affect the usage of the package as long as you make sure to be consistent when naming vaccine doses.

For example, the vaccine names used in the PAHOabc package are:

Abbreviation Full Name
SRP1 Measles, Rubella, and Mumps Vaccine (1st dose)
DTP1 Diphtheria, Tetanus, and Pertussis Vaccine (1st dose)
DTP2 Diphtheria, Tetanus, and Pertussis Vaccine (2nd dose)
DTP3 Diphtheria, Tetanus, and Pertussis Vaccine (3rd dose)
BCG RN Bacillus Calmette–Guérin Vaccine (at birth)
YFV1 Yellow Fever Vaccine (1st dose)

Usage

Install Package

The first step to run the residence occurrence analyses is to install the PAHOabc package available on GitHub.

devtools::install_github("IM-Data-PAHO/pahoabc")

Load Data

The functions in this module require you to provide three datasets in a specific format.

  1. Your EIR.
  2. The vaccination schedule related to this EIR.
  3. A table with population denominators depending on the geographic level of your analysis.

To make it easy for you to test out PAHOabc’s functionality and understand the structure we require for your datasets, we will now explore the example datasets provided by PAHOabc that are required by this module.

EIR

The pahoabc::pahoabc.EIR data frame provides a simulated, nominal-level table representing individual vaccination events from an Electronic Immunization Registry (EIR). Each row corresponds to a single vaccination act for a person, including information on their residence, where they were vaccinated (occurrence), their date of birth, and the vaccine dose received.

pahoabc.EIR %>% head() %>% kable(caption = "Example Electronic Immunization Registry")
Example Electronic Immunization Registry
ID date_birth date_vax ADM1_residence ADM2_residence ADM1_occurrence ADM2_occurrence dose
191997 2023-08-08 2023-12-26 ADM1_4 ADM2_4_35 ADM1_4 ADM2_4_35 DTP2
212189 2023-12-20 2023-12-26 ADM1_5 ADM2_5_61 ADM1_5 ADM2_5_61 BCG RN
118063 2022-09-15 2023-12-26 ADM1_2 ADM2_2_5 ADM1_2 ADM2_2_5 DTP1
118063 2022-09-15 2023-12-26 ADM1_2 ADM2_2_5 ADM1_2 ADM2_2_5 YFV1
130751 2022-10-27 2023-12-12 ADM1_5 ADM2_5_55 ADM1_5 ADM2_5_55 YFV1
136532 2021-09-21 2023-12-26 ADM1_3 ADM2_3_12 ADM1_3 ADM2_3_12 SRP1
  • ID: Unique person identification number.
  • date_birth: Date of birth of person.
  • date_vax: Date of vaccination event.
  • ADM1: Refers to the first geographic administrative level of the country.
  • ADM2: Refers to the second geographic administrative level of the country.
  • Residence: Refers to the place where the person lives.
  • Occurrence: Refers to the place where the vaccination event occurred.
  • dose: A combined variable representing the vaccine type and its corresponding dose number. For example, DTP1 refers to the first dose of a vaccine containing diphtheria, tetanus, and pertussis components.

Vaccination Schedule

The pahoabc::pahoabc.schedule dataset defines the national immunization schedule, listing each vaccine dose and its recommended age of administration (in months). It helps the package determine whether an individual is up to date with their vaccinations.

pahoabc.schedule %>% kable(caption = "Example Immunization Schedule")
Example Immunization Schedule
dose age_schedule
SRP1 12
DTP1 2
DTP2 4
DTP3 6
BCG RN 0
YFV1 12
  • dose: A combined variable representing the vaccine type and its corresponding dose number. For example, DTP1 refers to the first dose of a vaccine containing diphtheria, tetanus, and pertussis components.
  • age_schedule: The recommended age of administration of the corresponding dose in months.

Note

The dose names must match exactly those in the dose column of the pahoabc.EIR dataset.

Population Denominators

This module will require you to provide a table with population denominators. The geographic level of this table will be directly dependent on the geographic level of your analysis. For example, if you were to perform an analysis at ADM1 level, the required population denominator table must contain the following.

pahoabc.pop.ADM1 %>% head() %>% kable(caption = "Example Population at ADM1")
Example Population at ADM1
ADM1 year age population
ADM1_1 2022 0 1718.189
ADM1_1 2022 1 1808.964
ADM1_1 2023 0 1703.145
ADM1_1 2023 1 1792.443
ADM1_2 2022 0 6575.261
ADM1_2 2022 1 6922.644

Note that this table is in long format and contains the corresponding population of children of a specific age during a given year for a certain ADM1 level.

Note

If your analysis is at a different political geographic level, see pahoabc::pahoabc.pop.ADM0 and pahoabc::pahoabc.pop.ADM2 for the required format and columns.

Coverage Analysis

The first subsection of this module provides a way of analyzing coverage rates by both place of residence and place of vaccination (occurrence).

Expected Workflow

The roc_coverage() and roc_barplot() functions work together to perform coverage analysis and visualization. The roc_coverage() function computes vaccination coverage by both residence and occurrence across selected administrative levels, years, and vaccines. Its output (a data table) is structured to be directly compatible with roc_barplot(), which generates a visualization for a specific year and vaccine.

Alternatively, the roc_coverage_by() function is available when you want to calculate coverage by either residence or occurrence (but not both). While it produces a similar table, its output is not compatible with roc_barplot().

Figure 1. Expected workflow for coverage analyses.
Figure 1. Expected workflow for coverage analyses.


Example

Here is a simple use case of the roc_coverage() function, where we compute coverage by both metrics for the DTP1 vaccine in 2023 for our example datasets.

# compute coverage
coverage_df <- roc_coverage(
  data.EIR = pahoabc.EIR,
  data.schedule = pahoabc.schedule,
  data.pop = pahoabc.pop.ADM1,
  geo_level = "ADM1",
  years = 2023, # this can be a vector of years
  vaccines = "DTP1" # this can be a vector of vaccines
)

# show results in table
coverage_df %>%
  kable(digits = 3, caption = "Vaccination coverage by ADM1 level")
Vaccination coverage by ADM1 level
dose year age ADM1 doses_applied population coverage coverage_type
DTP1 2023 0 ADM1_1 1413 1703.145 82.964 residence
DTP1 2023 0 ADM1_2 6238 6517.689 95.709 residence
DTP1 2023 0 ADM1_3 27492 30497.127 90.146 residence
DTP1 2023 0 ADM1_4 3678 4081.329 90.118 residence
DTP1 2023 0 ADM1_5 2010 2645.993 75.964 residence
DTP1 2023 0 ADM1_1 1425 1703.145 83.669 occurrence
DTP1 2023 0 ADM1_2 9194 6517.689 141.062 occurrence
DTP1 2023 0 ADM1_3 24494 30497.127 80.316 occurrence
DTP1 2023 0 ADM1_4 3705 4081.329 90.779 occurrence
DTP1 2023 0 ADM1_5 2013 2645.993 76.077 occurrence

This output may be fed directly into the roc_barplot() function as shown below.

# visualize output from roc_coverage()
coverage_plot <- roc_barplot(
  data = coverage_df,
  year = 2023,
  vaccine = "DTP1"
)

# show
coverage_plot

This chart shows the vaccination coverage achieved by each subnational administrative level, considering both the place of occurrence and the place of residence. The yellow bars represent coverage by place of occurrence, meaning that the numerator includes all doses administered in that location. The orange diamonds represent coverage by place of residence, meaning that the numerator includes people who reside in that location, regardless of where they received the vaccine.

We can observe that the coverage by both residence and occurrence in ADM1_1, ADM1_4, and ADM1_5 are similar. However, in ADM1_2 and ADM1_3, there are differences between the two types of coverage.

In ADM1_2, the coverage by occurrence reaches 141%, while the coverage by residence is 95%, resulting in a difference of 46 percentage points. In ADM1_3, the coverage by occurrence is 80%, while the coverage by residence is 90%, with a difference of 10 percentage points in favor of the coverage by residence.

These patterns suggest a meaningful population flow between administrative areas. Specifically, in ADM1_2, the high coverage by occurrence alongside lower coverage by residence indicates that many individuals vaccinated in this area actually reside elsewhere. Conversely, ADM1_3 shows higher coverage by residence than by occurrence, suggesting that a significant number of its residents are traveling to other areas to receive their vaccinations.

Dose Distribution Analysis

The distribution analysis functions in this package help analyze where people get vaccinated depending on their place of residence.

Expected Workflow

The roc_distribution() and roc_heatmap() functions provide a way of analyzing population flow between administrative areas. For example, one might be interested in observing where people who reside in a single region get vaccinated (e.g., do they all get vaccinated within their own region, or do they move to other regions?).

The figure below shows the expected workflow for these functions.

Figure 2. Expected workflow for dose distribution analyses.
Figure 2. Expected workflow for dose distribution analyses.


Example at First Administrative Level

This analysis only requires you to provide your EIR in conjunction with other simple parameters. For example, let us compute how vaccination is distributed at the first administrative level for the DTP1 vaccine in 2023.

# compute distribution of doses for each place of residence
distribution_df <- roc_distribution(
  data.EIR = pahoabc.EIR,
  vaccine = "DTP1",
  geo_level = "ADM1",
  birth_cohort = 2023,
  include_self_matches = TRUE
)

# show results in table (view only a couple of regions for simplicity)
distribution_df %>%
  filter(ADM1_residence %in% c("ADM1_1", "ADM1_3")) %>%
  kable(digits = 3, caption = "Vaccination distribution for each place of residence by place of occurrence")
Vaccination distribution for each place of residence by place of occurrence
ADM1_residence ADM1_occurrence frequency proportion
ADM1_1 ADM1_3 7 0.006
ADM1_1 ADM1_2 4 0.004
ADM1_1 ADM1_4 1 0.001
ADM1_1 ADM1_5 1 0.001
ADM1_1 ADM1_1 1084 0.988
ADM1_3 ADM1_3 18443 0.868
ADM1_3 ADM1_2 2685 0.126
ADM1_3 ADM1_4 88 0.004
ADM1_3 ADM1_5 17 0.001
ADM1_3 ADM1_1 12 0.001

For example, 87% of people who reside in ADM1_3 got vaccinated for DTP1 there as well. But an additional 13% received their DTP1 vaccination elsewhere (namely, in ADM1_2). If you want to further drill into the people that got vaccinated in a place different from their place of residence, you may set the include_self_matches parameter to FALSE.

# compute distribution of doses for each place of residence
distribution_df2 <- roc_distribution(
  data.EIR = pahoabc.EIR,
  vaccine = "DTP1",
  geo_level = "ADM1",
  birth_cohort = 2023,
  # set this to FALSE to exclude cases where people get vaccinated in their place of residence
  include_self_matches = FALSE
)

# show results in table (focus on ADM1_3 only for this example)
distribution_df2 %>%
  filter(ADM1_residence == "ADM1_3") %>%
  kable(digits = 3, caption = "Where do residents of ADM1_3 get vaccinated?")
Where do residents of ADM1_3 get vaccinated?
ADM1_residence ADM1_occurrence frequency proportion
ADM1_3 ADM1_2 2685 0.958
ADM1_3 ADM1_4 88 0.031
ADM1_3 ADM1_5 17 0.006
ADM1_3 ADM1_1 12 0.004

This shows that 96% of the people in ADM1_3 who do not get vaccinated in their place of residence, get vaccinated in ADM1_2.

A clearer picture of this can be provided by the roc_heatmap() function.

library(patchwork) # to show them side by side

# generate the plots
distribution_plot <- roc_heatmap(distribution_df) # with self matches
distribution_plot2 <- roc_heatmap(distribution_df2) # without self matches

# add subtitle
distribution_plot <- distribution_plot + labs(subtitle = "With self matches")
distribution_plot2 <- distribution_plot2 + labs(subtitle = "Without self matches")

# show them side by side
distribution_plot + distribution_plot2

Note

The plot on the right shows the proportion of doses administered in a place different from their place of residence. This is why the diagonal is empty.

These plots may be read either in a vertical or horizontal fashion. When reading the plot vertically, the columns will sum to 100% (up to a rounding error). That is, each column represents the population of, say, ADM1_1 and where they get vaccinated. For example, if we concentrate on the plot on the right and read it vertically, we may say that, of the total amount of people who reside in ADM1_1 and get vaccinated in a place different from their place of residence, 54% does so in ADM1_3, 31% in ADM1_2 and a remaining 16% in ADM1_4 and ADM1_5.

When reading horizontally, rows will not sum to 100%, but are still indicative of the residences of people getting vaccinated in a certain region. For example, reading the plot on the left in a horizontal fashion we may note that 92% of the residents of ADM1_2 get vaccinated there as well, but this region also vaccinates a significant proportion of residents of ADM1_3 (13%).

Example at Second Administrative Level

An analysis at a more granular level is beneficial in this case. Let us repeat the above process, but at the second administrative level (e.g., districts within a certain region).

# compute distribution of doses for each place of residence at second
# administrative level (focus on ADM1_3 only for this example)
distribution_df3 <- roc_distribution(
  data.EIR = pahoabc.EIR,
  vaccine = "DTP1",
  geo_level = "ADM2",
  birth_cohort = 2023,
  include_self_matches = FALSE,
  within_ADM1 = "ADM1_3" # set the region to drill into
)

# show results in plot 
distribution_plot3 <- roc_heatmap(
  distribution_df3
)

# show
distribution_plot3

In this heatmap, we observe the proportion of vaccination events that take place in a location different from the place of residence, broken down by the second administrative level.

As an example, it is notable that among all individuals residing in ADM2_3_10 who are vaccinated outside their area of residence, 72% receive their vaccine in ADM2_3_11. Similarly, 23% of individuals from ADM2_3_23 who are vaccinated elsewhere, are vaccinated in ADM2_3_11.

Looking at the heatmap horizontally, we can see that ADM2_3_20 provides vaccinations to a large number of people coming from other areas: 35% of those from ADM2_3_7, 18% from ADM2_3_15, 17% from ADM2_3_12, 14% from ADM2_3_17, etc.

The top row is also interesting, since it indicates people getting vaccinated outside of ADM1_3 entirely. For example, in column 4, 85% of the residents of ADM3_3_13 get vaccinated outside of ADM1_3, suggesting a strong population flow.

Summary

This vignette showed how you can use the PAHOabc package to compare vaccination coverage by place of residence and place of occurrence, and to explore how people move between areas to get vaccinated.

We saw that coverage can vary a lot depending on which location you use, and in some cases, coverage by occurrence can go over 100% — a sign that people are coming from elsewhere to get vaccinated. The distribution and heatmap functions help visualize these patterns and can reveal which areas act as service hubs or where people are going for vaccines.

Overall, this kind of analysis helps make sense of population flows and supports better planning for immunization programs.