`R/survey_statistics.r`

`survey_mean.Rd`

Calculate means and proportions from complex survey data.
`survey_mean`

with `proportion = FALSE`

(the default) or `survey_prop`

with `proportion = FALSE`

is a wrapper around `svymean`

.
`survey_prop`

with `proportion = TRUE`

(the default) or `survey_mean`

with `proportion = TRUE`

is a wrapper around `svyciprop`

.
`survey_mean`

and `survey_prop`

should always be called from `summarise`

.

```
survey_mean(
x,
na.rm = FALSE,
vartype = c("se", "ci", "var", "cv"),
level = 0.95,
proportion = FALSE,
prop_method = c("logit", "likelihood", "asin", "beta", "mean", "xlogit"),
deff = FALSE,
df = NULL,
...
)
survey_prop(
vartype = c("se", "ci", "var", "cv"),
level = 0.95,
proportion = TRUE,
prop_method = c("logit", "likelihood", "asin", "beta", "mean", "xlogit"),
deff = FALSE,
df = NULL,
...
)
```

- x
A variable or expression, or empty

- na.rm
A logical value to indicate whether missing values should be dropped. See the section "Missing Values" later in this help page.

- vartype
Report variability as one or more of: standard error ("se", default), confidence interval ("ci"), variance ("var") or coefficient of variation ("cv").

- level
(For vartype = "ci" only) A single number or vector of numbers indicating the confidence level

- proportion
Use methods to calculate the proportion that may have more accurate confidence intervals near 0 and 1. Based on

`svyciprop`

.- prop_method
Type of proportion method to use if proportion is

`TRUE`

. See`svyciprop`

for details.- deff
A logical value to indicate whether the design effect should be returned.

- df
(For vartype = "ci" only) A numeric value indicating the degrees of freedom for t-distribution. The default (NULL) uses

`degf`

, but Inf is the usual survey package's default (except in`svyciprop`

.- ...
Ignored

Using `survey_prop`

is equivalent to leaving out the `x`

argument in
`survey_mean`

and setting `proportion = TRUE`

and this calculates the proportion represented within the
data, with the last grouping variable "unpeeled". `interact`

allows for "unpeeling" multiple variables at once.

When calculating proportions for a grouping variable `x`

, `NA`

values
will affect the estimated proportions unless they are first removed by calling
`filter(!is.na(x))`

.

When calculating means for a numeric variable, equivalent results are obtained
by calling `filter(!is.na(x))`

or using `survey_mean(x, na.rm = TRUE)`

.
However, it is better to use `survey_mean(x, na.rm = TRUE)`

if
you are simultaneously producing summaries for other variables
that might not have missing values for the same rows as `x`

.

```
data(api, package = "survey")
dstrata <- apistrat %>%
as_survey_design(strata = stype, weights = pw)
dstrata %>%
summarise(api99_mn = survey_mean(api99),
api_diff = survey_mean(api00 - api99, vartype = c("ci", "cv")))
#> # A tibble: 1 × 6
#> api99_mn api99_mn_se api_diff api_diff_low api_diff_upp api_diff_cv
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 629. 10.1 32.9 28.8 37.0 0.0632
dstrata %>%
group_by(awards) %>%
summarise(api00 = survey_mean(api00))
#> # A tibble: 2 × 3
#> awards api00 api00_se
#> <fct> <dbl> <dbl>
#> 1 No 634. 15.6
#> 2 Yes 678. 12.0
# Use `survey_prop` calculate the proportion in each group
dstrata %>%
group_by(awards) %>%
summarise(pct = survey_prop())
#> When `proportion` is unspecified, `survey_prop()` now defaults to `proportion = TRUE`.
#> ℹ This should improve confidence interval coverage.
#> This message is displayed once per session.
#> # A tibble: 2 × 3
#> awards pct pct_se
#> <fct> <dbl> <dbl>
#> 1 No 0.361 0.0349
#> 2 Yes 0.639 0.0349
# Or you can also leave out `x` in `survey_mean`, so this is equivalent
dstrata %>%
group_by(awards) %>%
summarise(pct = survey_mean())
#> # A tibble: 2 × 3
#> awards pct pct_se
#> <fct> <dbl> <dbl>
#> 1 No 0.361 0.0349
#> 2 Yes 0.639 0.0349
# When there's more than one group, the last group is "peeled" off and proportions are
# calculated within that group, each adding up to 100%.
# So in this example, the sum of prop is 200% (100% for awards=="Yes" &
# 100% for awards=="No")
dstrata %>%
group_by(stype, awards) %>%
summarize(prop = survey_prop())
#> # A tibble: 6 × 4
#> # Groups: stype [3]
#> stype awards prop prop_se
#> <fct> <fct> <dbl> <dbl>
#> 1 E No 0.270 0.0446
#> 2 E Yes 0.730 0.0446
#> 3 H No 0.680 0.0666
#> 4 H Yes 0.320 0.0666
#> 5 M No 0.520 0.0714
#> 6 M Yes 0.480 0.0714
# The `interact` function can help you calculate the proportion over
# the interaction of two or more variables
# So in this example, the sum of prop is 100%
dstrata %>%
group_by(interact(stype, awards)) %>%
summarize(prop = survey_prop())
#> # A tibble: 6 × 4
#> stype awards prop prop_se
#> <fct> <fct> <dbl> <dbl>
#> 1 E No 0.193 0.0318
#> 2 E Yes 0.521 0.0318
#> 3 H No 0.0829 0.00812
#> 4 H Yes 0.0390 0.00812
#> 5 M No 0.0855 0.0117
#> 6 M Yes 0.0789 0.0117
# Setting proportion = TRUE uses a different method for calculating confidence intervals
dstrata %>%
summarise(high_api = survey_mean(api00 > 875, proportion = TRUE, vartype = "ci"))
#> # A tibble: 1 × 3
#> high_api high_api_low high_api_upp
#> <dbl> <dbl> <dbl>
#> 1 0.0318 0.0129 0.0765
# level takes a vector for multiple levels of confidence intervals
dstrata %>%
summarise(api99 = survey_mean(api99, vartype = "ci", level = c(0.95, 0.65)))
#> # A tibble: 1 × 5
#> api99 api99_low95 api99_upp95 api99_low65 api99_upp65
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 629. 609. 649. 620. 639.
# Note that the default degrees of freedom in srvyr is different from
# survey, so your confidence intervals might not be exact matches. To
# Replicate survey's behavior, use df = Inf
dstrata %>%
summarise(srvyr_default = survey_mean(api99, vartype = "ci"),
survey_defualt = survey_mean(api99, vartype = "ci", df = Inf))
#> # A tibble: 1 × 6
#> srvyr_default srvyr_default_low srvyr_default_upp survey_defualt
#> <dbl> <dbl> <dbl> <dbl>
#> 1 629. 609. 649. 629.
#> # ℹ 2 more variables: survey_defualt_low <dbl>, survey_defualt_upp <dbl>
comparison <- survey::svymean(~api99, dstrata)
confint(comparison) # survey's default
#> 2.5 % 97.5 %
#> api99 609.6051 649.1846
confint(comparison, df = survey::degf(dstrata)) # srvyr's default
#> 2.5 % 97.5 %
#> api99 609.4828 649.3069
```