Calculate population variance from complex survey data. A wrapper around svyvar. survey_var should always be called from summarise.

survey_var(
  x,
  na.rm = FALSE,
  vartype = c("se", "ci", "var"),
  level = 0.95,
  df = NULL,
  ...
)

survey_sd(x, na.rm = FALSE, ...)

Arguments

x

A variable or expression, or empty

na.rm

A logical value to indicate whether missing values should be dropped

vartype

Report variability as one or more of: standard error ("se", default) or variance ("var") (confidence intervals and coefficient of variation not available).

level

(For vartype = "ci" only) A single number or vector of numbers indicating the confidence level.

df

(For vartype = "ci" only) A numeric value indicating the degrees of freedom for t-distribution. The default (Inf) is equivalent to using normal distribution and in case of population variance statistics there is little reason to use any other values (see Details).

...

Ignored

Details

Be aware that confidence intervals for population variance statistic are computed by package survey using t or normal (with df=Inf) distribution (i.e. symmetric distributions). This could be a very poor approximation if even one of these conditions is met:

  • there are few sampling design degrees of freedom,

  • analyzed variable isn't normally distributed,

  • there is huge variation in sampling probabilities of the survey design.

Because of this be very careful using confidence intervals for population variance statistics especially while performing analysis within subsets of data or using grouped survey objects.

Sampling distribution of the variance statistic in general is asymmetric (chi-squared in case of simple random sampling of normally distributed variable) and if analyzed variable isn't normally distributed or there is huge variation in sampling probabilities of the survey design (or both) it could converge to normality only very slowly (with growing number of survey design degrees of freedom).

Examples

library(survey)
data(api)

dstrata <- apistrat %>%
  as_survey_design(strata = stype, weights = pw)

dstrata %>%
  summarise(api99_var = survey_var(api99),
            api99_sd = survey_sd(api99))
#> # A tibble: 1 × 3
#>   api99_var api99_var_se api99_sd
#>       <dbl>        <dbl>    <dbl>
#> 1    16518.        1336.     129.

dstrata %>%
  group_by(awards) %>%
  summarise(api00_var = survey_var(api00),
            api00_sd = survey_sd(api00))
#> # A tibble: 2 × 4
#>   awards api00_var api00_var_se api00_sd
#>   <fct>      <dbl>        <dbl>    <dbl>
#> 1 No        15669.        2021.     125.
#> 2 Yes       14309.        1509.     120.

# standard deviation and variance of the population variance estimator
# are available with vartype argument
# (but not for the population standard deviation estimator)
dstrata %>%
  summarise(api99_variance = survey_var(api99, vartype = c("se", "var")))
#> # A tibble: 1 × 3
#>   api99_variance api99_variance_se api99_variance_var
#>            <dbl>             <dbl>              <dbl>
#> 1         16518.             1336.           1785755.