Calculate means and proportions from complex survey data. A wrapper around svymean, or if proportion = TRUE, svyciprop. survey_mean should always be called from summarise.

survey_mean(
  x,
  na.rm = FALSE,
  vartype = c("se", "ci", "var", "cv"),
  level = 0.95,
  proportion = FALSE,
  prop_method = c("logit", "likelihood", "asin", "beta", "mean"),
  deff = FALSE,
  df = NULL,
  ...
)

survey_prop(
  vartype = c("se", "ci", "var", "cv"),
  level = 0.95,
  proportion = FALSE,
  prop_method = c("logit", "likelihood", "asin", "beta", "mean"),
  deff = FALSE,
  df = NULL,
  ...
)

Arguments

x

A variable or expression, or empty

na.rm

A logical value to indicate whether missing values should be dropped

vartype

Report variability as one or more of: standard error ("se", default), confidence interval ("ci"), variance ("var") or coefficient of variation ("cv").

level

(For vartype = "ci" only) A single number or vector of numbers indicating the confidence level

proportion

Use methods to calculate the proportion that may have more accurate confidence intervals near 0 and 1. Based on svyciprop.

prop_method

Type of proportion method to use if proportion is TRUE. See svyciprop for details.

deff

A logical value to indicate whether the design effect should be returned.

df

(For vartype = "ci" only) A numeric value indicating the degrees of freedom for t-distribution. The default (NULL) uses degf, but Inf is the usual survey package's default (except in svyciprop.

...

Ignored

Details

Using survey_prop is equivalent to leaving out the x argument in survey_mean and this calculates the proportion represented within the data, with the last grouping variable "unpeeled". interact allows for "unpeeling" multiple variables at once.

Examples

data(api, package = "survey") dstrata <- apistrat %>% as_survey_design(strata = stype, weights = pw) dstrata %>% summarise(api99_mn = survey_mean(api99), api_diff = survey_mean(api00 - api99, vartype = c("ci", "cv")))
#> # A tibble: 1 × 6 #> api99_mn api99_mn_se api_diff api_diff_low api_diff_upp api_diff_cv #> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 629. 10.1 32.9 28.8 37.0 0.0632
dstrata %>% group_by(awards) %>% summarise(api00 = survey_mean(api00))
#> # A tibble: 2 × 3 #> awards api00 api00_se #> <fct> <dbl> <dbl> #> 1 No 634. 15.6 #> 2 Yes 678. 12.0
# Use `survey_prop` calculate the proportion in each group dstrata %>% group_by(awards) %>% summarise(pct = survey_prop())
#> # A tibble: 2 × 3 #> awards pct pct_se #> <fct> <dbl> <dbl> #> 1 No 0.361 0.0349 #> 2 Yes 0.639 0.0349
# Or you can also leave out `x` in `survey_mean`, so this is equivalent dstrata %>% group_by(awards) %>% summarise(pct = survey_mean())
#> # A tibble: 2 × 3 #> awards pct pct_se #> <fct> <dbl> <dbl> #> 1 No 0.361 0.0349 #> 2 Yes 0.639 0.0349
# When there's more than one group, the last group is "peeled" off and proportions are # calculated within that group, each adding up to 100%. # So in this example, the sum of prop is 200% (100% for awards=="Yes" & # 100% for awards=="No") dstrata %>% group_by(stype, awards) %>% summarize(prop = survey_prop())
#> # A tibble: 6 × 4 #> # Groups: stype [3] #> stype awards prop prop_se #> <fct> <fct> <dbl> <dbl> #> 1 E No 0.27 0.0446 #> 2 E Yes 0.73 0.0446 #> 3 H No 0.68 0.0666 #> 4 H Yes 0.32 0.0666 #> 5 M No 0.52 0.0714 #> 6 M Yes 0.48 0.0714
# The `interact` function can help you calculate the proportion over # the interaction of two or more variables # So in this example, the sum of prop is 100% dstrata %>% group_by(interact(stype, awards)) %>% summarize(prop = survey_prop())
#> # A tibble: 6 × 4 #> stype awards prop prop_se #> <fct> <fct> <dbl> <dbl> #> 1 E No 0.193 0.0318 #> 2 E Yes 0.521 0.0318 #> 3 H No 0.0829 0.00812 #> 4 H Yes 0.0390 0.00812 #> 5 M No 0.0855 0.0117 #> 6 M Yes 0.0789 0.0117
# Setting proportion = TRUE uses a different method for calculating confidence intervals dstrata %>% summarise(high_api = survey_mean(api00 > 875, proportion = TRUE, vartype = "ci"))
#> # A tibble: 1 × 3 #> high_api high_api_low high_api_upp #> <dbl> <dbl> <dbl> #> 1 0.0318 0.0129 0.0765
# level takes a vector for multiple levels of confidence intervals dstrata %>% summarise(api99 = survey_mean(api99, vartype = "ci", level = c(0.95, 0.65)))
#> # A tibble: 1 × 5 #> api99 api99_low95 api99_upp95 api99_low65 api99_upp65 #> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 629. 609. 649. 620. 639.
# Note that the default degrees of freedom in srvyr is different from # survey, so your confidence intervals might not be exact matches. To # Replicate survey's behavior, use df = Inf dstrata %>% summarise(srvyr_default = survey_mean(api99, vartype = "ci"), survey_defualt = survey_mean(api99, vartype = "ci", df = Inf))
#> # A tibble: 1 × 6 #> srvyr_default srvyr_default_low srvyr_default_upp survey_def…¹ surve…² surve…³ #> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 629. 609. 649. 629. 610. 649. #> # … with abbreviated variable names ¹​survey_defualt, ²​survey_defualt_low, #> # ³​survey_defualt_upp
comparison <- survey::svymean(~api99, dstrata) confint(comparison) # survey's default
#> 2.5 % 97.5 % #> api99 609.6051 649.1846
confint(comparison, df = survey::degf(dstrata)) # srvyr's default
#> 2.5 % 97.5 % #> api99 609.4828 649.3069