`R/survey_statistics.r`

`survey_var.Rd`

Calculate population variance from complex survey data. A wrapper
around `svyvar`

. `survey_var`

should always be
called from `summarise`

.

survey_var( x, na.rm = FALSE, vartype = c("se", "ci", "var"), level = 0.95, df = NULL, ... ) survey_sd(x, na.rm = FALSE, ...)

x | A variable or expression, or empty |
---|---|

na.rm | A logical value to indicate whether missing values should be dropped |

vartype | Report variability as one or more of: standard error ("se", default) or variance ("var") (confidence intervals and coefficient of variation not available). |

level | (For vartype = "ci" only) A single number or vector of numbers indicating the confidence level. |

df | (For vartype = "ci" only) A numeric value indicating the degrees of freedom
for t-distribution. The default (Inf) is equivalent to using normal
distribution and in case of population variance statistics there is little
reason to use any other values (see |

... | Ignored |

Be aware that confidence intervals for population variance statistic are
computed by package *survey* using *t* or normal (with df=Inf)
distribution (i.e. symmetric distributions). **This could be a very poor
approximation** if even one of these conditions is met:

there are few sampling design degrees of freedom,

analyzed variable isn't normally distributed,

there is huge variation in sampling probabilities of the survey design.

Because of this be very careful using confidence intervals for population variance statistics especially while performing analysis within subsets of data or using grouped survey objects.

Sampling distribution of the variance statistic in general is asymmetric (chi-squared in case of simple random sampling of normally distributed variable) and if analyzed variable isn't normally distributed or there is huge variation in sampling probabilities of the survey design (or both) it could converge to normality only very slowly (with growing number of survey design degrees of freedom).

library(survey) data(api) dstrata <- apistrat %>% as_survey_design(strata = stype, weights = pw) dstrata %>% summarise(api99_var = survey_var(api99), api99_sd = survey_sd(api99))#> # A tibble: 1 × 3 #> api99_var api99_var_se api99_sd #> <dbl> <dbl> <dbl> #> 1 16518. 1336. 129.dstrata %>% group_by(awards) %>% summarise(api00_var = survey_var(api00), api00_sd = survey_sd(api00))#> # A tibble: 2 × 4 #> awards api00_var api00_var_se api00_sd #> <fct> <dbl> <dbl> <dbl> #> 1 No 15669. 2021. 125. #> 2 Yes 14309. 1509. 120.# standard deviation and variance of the population variance estimator # are available with vartype argument # (but not for the population standard deviation estimator) dstrata %>% summarise(api99_variance = survey_var(api99, vartype = c("se", "var")))#> # A tibble: 1 × 3 #> api99_variance api99_variance_se api99_variance_var #> <dbl> <dbl> <dbl> #> 1 16518. 1336. 1785755.