Introducing the ipfr package

I’m happy to announce that the ipfr package is available on CRAN! The goal of this package is to make survey expansion, matrix balancing, and population synthesis easier.

A basic use case is the task of balancing a matrix to row and column targets:

library(ipfr)
library(dplyr)
mtx <- matrix(data = runif(9), nrow = 3, ncol = 3)
row_targets <- c(3, 4, 5)
column_targets <- c(5, 4, 3)
result <- ipu_matrix(mtx, row_targets, column_targets)

rowSums(result)
#> [1] 3.000001 4.000015 4.999985
colSums(result)
#> [1] 5 4 3

The example below creates a simple survey and expands it to meet known population targets. Each row in the survey data frame represents a household and contains information on the number of household members (size) and number of autos. The targets list contains population targets that the survey expansion should match. For example, there should be a total of 75 households with 1 person.

survey <- tibble(
  size = c(1, 2, 1, 1),
  autos = c(0, 2, 2, 1),
  weight = 1
)
targets <- list()
targets$size <- tibble(
  `1` = 75,
  `2` = 25
)
targets$autos <- tibble(
  `0` = 25,
  `1` = 50,
  `2` = 25
)
result <- ipu(survey, targets)

The package also supports a number of advanced features:

  • Match to household- and person-level targets simultaneously
  • View and restrict the distribution of resulting weights
  • Control by geography
  • Handle target agreement and importance

Finally, the resulting weight table can be used to easily create a synthetic population:

synthesize(result$weight_tbl)
#> # A tibble: 100 x 4
#>    new_id    id  size autos
#>     <int> <int> <dbl> <dbl>
#>  1      1     1     1     0
#>  2      2     4     1     1
#>  3      3     1     1     0
#>  4      4     2     2     2
#>  5      5     4     1     1
#>  6      6     4     1     1
#>  7      7     2     2     2
#>  8      8     2     2     2
#>  9      9     4     1     1
#> 10     10     4     1     1
#> # ... with 90 more rows
Avatar
Kyle Ward
Senior Data Scientist

My interests include data science, machine learning, and travel modeling.

Related

Next