R/survey_statistics.r
unweighted.Rd
Calculate unweighted summaries from a survey dataset, just as on
a normal data.frame with summarise
. Though it is
possible to use regular functions directly, because the survey package
doesn't always remove rows when filtering (instead setting the weight to 0),
this can sometimes give bad results. See examples for more details.
unweighted(...)
variables or expressions, calculated on the unweighted data.frame
behind the tbl_svy
object.
Uses tidy evaluation semantics and so if you want to use wrapper functions based on variable names, you must use tidy evaluation, see the examples here, documentation in nse-force, or the dplyr vignette called 'programming' for more information.
library(survey)
library(dplyr)
data(api)
dstrata <- apistrat %>%
as_survey_design(strata = stype, weights = pw)
dstrata %>%
summarise(api99_unw = unweighted(mean(api99)),
n = unweighted(n()))
#> # A tibble: 1 × 2
#> api99_unw n
#> <dbl> <int>
#> 1 625. 200
dstrata %>%
group_by(stype) %>%
summarise(api_diff_unw = unweighted(mean(api00 - api99)))
#> # A tibble: 3 × 2
#> stype api_diff_unw
#> <fct> <dbl>
#> 1 E 38.6
#> 2 H 8.46
#> 3 M 26.4
# Some survey designs, like ones with raked weights, are not removed
# when filtered to preserve the structure. So if you don't use `unweighted()`
# your results can be wrong.
# Declare basic clustered design ----
cluster_design <- as_survey_design(
.data = apiclus1,
id = dnum,
weights = pw,
fpc = fpc
)
# Add raking weights for school type ----
pop.types <- data.frame(stype=c("E","H","M"), Freq=c(4421,755,1018))
pop.schwide <- data.frame(sch.wide=c("No","Yes"), Freq=c(1072,5122))
raked_design <- rake(
cluster_design,
sample.margins = list(~stype,~sch.wide),
population.margins = list(pop.types, pop.schwide)
)
raked_design %>%
filter(cname != "Alameda") %>%
group_by(cname) %>%
summarize(
direct_unw_mean = mean(api99),
wrapped_unw_mean = unweighted(mean(api99))
) %>%
filter(cname == "Alameda")
#> # A tibble: 1 × 3
#> cname direct_unw_mean wrapped_unw_mean
#> <chr> <dbl> <dbl>
#> 1 Alameda 609 NaN
# Notice how the results are different when using `unweighted()`