Calculate mean/proportion and its variation using survey methods

Calculate means and proportions from complex survey data. survey_mean with proportion = FALSE (the default) or survey_prop with proportion = FALSE is a wrapper around svymean. survey_prop with proportion = TRUE (the default) or survey_mean with proportion = TRUE is a wrapper around svyciprop. survey_mean and survey_prop should always be called from summarise.

Usage

survey_mean(
  x,
  na.rm = FALSE,
  vartype = c("se", "ci", "var", "cv"),
  level = 0.95,
  proportion = FALSE,
  prop_method = c("logit", "likelihood", "asin", "beta", "mean", "xlogit"),
  deff = FALSE,
  df = NULL,
  ...
)

survey_prop(
  vartype = c("se", "ci", "var", "cv"),
  level = 0.95,
  proportion = TRUE,
  prop_method = c("logit", "likelihood", "asin", "beta", "mean", "xlogit"),
  deff = FALSE,
  df = NULL,
  ...
)

Arguments

x: A variable or expression, or empty
na.rm: A logical value to indicate whether missing values should be dropped. See the section "Missing Values" later in this help page.
vartype: Report variability as one or more of: standard error ("se", default), confidence interval ("ci"), variance ("var") or coefficient of variation ("cv").
level: (For vartype = "ci" only) A single number or vector of numbers indicating the confidence level
proportion: Use methods to calculate the proportion that may have more accurate confidence intervals near 0 and 1. Based on svyciprop.
prop_method: Type of proportion method to use if proportion is TRUE. See svyciprop for details.
deff: A logical value to indicate whether the design effect should be returned.
df: (For vartype = "ci" only) A numeric value indicating the degrees of freedom for t-distribution. The default (NULL) uses degf, but Inf is the usual survey package's default (except in svyciprop.
...: Ignored

Details

Using survey_prop is equivalent to leaving out the x argument in survey_mean and setting proportion = TRUE and this calculates the proportion represented within the data, with the last grouping variable "unpeeled". interact allows for "unpeeling" multiple variables at once.

Missing Values

When calculating proportions for a grouping variable x, NA values will affect the estimated proportions unless they are first removed by calling filter(!is.na(x)).

When calculating means for a numeric variable, equivalent results are obtained by calling filter(!is.na(x)) or using survey_mean(x, na.rm = TRUE). However, it is better to use survey_mean(x, na.rm = TRUE) if you are simultaneously producing summaries for other variables that might not have missing values for the same rows as x.

Examples

data(api, package = "survey")

dstrata <- apistrat %>%
  as_survey_design(strata = stype, weights = pw)

dstrata %>%
  summarise(api99_mn = survey_mean(api99),
            api_diff = survey_mean(api00 - api99, vartype = c("ci", "cv")))
#> # A tibble: 1 × 6
#>   api99_mn api99_mn_se api_diff api_diff_low api_diff_upp api_diff_cv
#>      <dbl>       <dbl>    <dbl>        <dbl>        <dbl>       <dbl>
#> 1     629.        10.1     32.9         28.8         37.0      0.0632

dstrata %>%
  group_by(awards) %>%
  summarise(api00 = survey_mean(api00))
#> # A tibble: 2 × 3
#>   awards api00 api00_se
#>   <fct>  <dbl>    <dbl>
#> 1 No      634.     15.6
#> 2 Yes     678.     12.0

# Use `survey_prop` calculate the proportion in each group
dstrata %>%
  group_by(awards) %>%
  summarise(pct = survey_prop())
#> When `proportion` is unspecified, `survey_prop()` now defaults to `proportion = TRUE`.
#> ℹ This should improve confidence interval coverage.
#> This message is displayed once per session.
#> # A tibble: 2 × 3
#>   awards   pct pct_se
#>   <fct>  <dbl>  <dbl>
#> 1 No     0.361 0.0349
#> 2 Yes    0.639 0.0349

# Or you can also leave  out `x` in `survey_mean`, so this is equivalent
dstrata %>%
  group_by(awards) %>%
  summarise(pct = survey_mean())
#> # A tibble: 2 × 3
#>   awards   pct pct_se
#>   <fct>  <dbl>  <dbl>
#> 1 No     0.361 0.0349
#> 2 Yes    0.639 0.0349

# When there's more than one group, the last group is "peeled" off and proportions are
# calculated within that group, each adding up to 100%.
# So in this example, the sum of prop is 200% (100% for awards=="Yes" &
# 100% for awards=="No")
dstrata %>%
  group_by(stype, awards) %>%
  summarize(prop = survey_prop())
#> # A tibble: 6 × 4
#> # Groups:   stype [3]
#>   stype awards  prop prop_se
#>   <fct> <fct>  <dbl>   <dbl>
#> 1 E     No     0.270  0.0446
#> 2 E     Yes    0.730  0.0446
#> 3 H     No     0.680  0.0666
#> 4 H     Yes    0.320  0.0666
#> 5 M     No     0.520  0.0714
#> 6 M     Yes    0.480  0.0714

# The `interact` function can help you calculate the proportion over
# the interaction of two or more variables
# So in this example, the sum of prop is 100%
dstrata %>%
  group_by(interact(stype, awards)) %>%
  summarize(prop = survey_prop())
#> # A tibble: 6 × 4
#>   stype awards   prop prop_se
#>   <fct> <fct>   <dbl>   <dbl>
#> 1 E     No     0.193  0.0318 
#> 2 E     Yes    0.521  0.0318 
#> 3 H     No     0.0829 0.00812
#> 4 H     Yes    0.0390 0.00812
#> 5 M     No     0.0855 0.0117 
#> 6 M     Yes    0.0789 0.0117 

# Setting proportion = TRUE uses a different method for calculating confidence intervals
dstrata %>%
  summarise(high_api = survey_mean(api00 > 875, proportion = TRUE, vartype = "ci"))
#> # A tibble: 1 × 3
#>   high_api high_api_low high_api_upp
#>      <dbl>        <dbl>        <dbl>
#> 1   0.0318       0.0129       0.0765

# level takes a vector for multiple levels of confidence intervals
dstrata %>%
  summarise(api99 = survey_mean(api99, vartype = "ci", level = c(0.95, 0.65)))
#> # A tibble: 1 × 5
#>   api99 api99_low95 api99_upp95 api99_low65 api99_upp65
#>   <dbl>       <dbl>       <dbl>       <dbl>       <dbl>
#> 1  629.        609.        649.        620.        639.

# Note that the default degrees of freedom in srvyr is different from
# survey, so your confidence intervals might not be exact matches. To
# Replicate survey's behavior, use df = Inf
dstrata %>%
  summarise(srvyr_default = survey_mean(api99, vartype = "ci"),
            survey_defualt = survey_mean(api99, vartype = "ci", df = Inf))
#> # A tibble: 1 × 6
#>   srvyr_default srvyr_default_low srvyr_default_upp survey_defualt
#>           <dbl>             <dbl>             <dbl>          <dbl>
#> 1          629.              609.              649.           629.
#> # ℹ 2 more variables: survey_defualt_low <dbl>, survey_defualt_upp <dbl>

comparison <- survey::svymean(~api99, dstrata)
confint(comparison) # survey's default
#>          2.5 %   97.5 %
#> api99 609.6051 649.1846
confint(comparison, df = survey::degf(dstrata)) # srvyr's default
#>          2.5 %   97.5 %
#> api99 609.4828 649.3069