Create a tbl_svy survey object using two phase design
Source:R/as_survey_twophase.r
as_survey_twophase.Rd
Create a survey object by specifying the survey's two phase design. It is a
wrapper around twophase
. All survey variables must be
included in the data.frame itself. Variables are selected by using bare
column names, or convenience functions described in
select
.
Usage
as_survey_twophase(.data, ...)
# S3 method for class 'data.frame'
as_survey_twophase(
.data,
id,
strata = NULL,
probs = NULL,
weights = NULL,
fpc = NULL,
subset,
method = c("full", "approx", "simple"),
...
)
# S3 method for class 'twophase2'
as_survey_twophase(.data, ...)
Arguments
- .data
A data frame (which contains the variables specified below)
- ...
ignored
- id
list of two sets of variable names for sampling unit identifiers
- strata
list of two sets of variable names (or
NULLs
) for stratum identifiers- probs
list of two sets of variable names (or
NULLs
) for sampling probabilities- weights
Only for method = "approx", list of two sets of variable names (or
NULLs
) for sampling weights- fpc
list of two sets of variables (or
NULLs
for finite population corrections- subset
bare name of a variable which specifies which observations are selected in phase 2
- method
"full" requires (much) more memory, but gives unbiased variance estimates for general multistage designs at both phases. "simple" or "approx" use less memory, and is correct for designs with simple random sampling at phase one and stratified randoms sampling at phase two. See
twophase
for more details.
Examples
# Examples from ?survey::twophase
# two-phase simple random sampling.
data(pbc, package="survival")
library(dplyr)
pbc <- pbc %>%
mutate(randomized = !is.na(trt) & trt > 0,
id = row_number())
d2pbc <- pbc %>%
as_survey_twophase(id = list(id, id), subset = randomized)
d2pbc %>% summarize(mean = survey_mean(bili))
#> # A tibble: 1 × 2
#> mean mean_se
#> <dbl> <dbl>
#> 1 3.26 0.256
# two-stage sampling as two-phase
library(survey)
data(mu284)
mu284_1 <- mu284 %>%
dplyr::slice(c(1:15, rep(1:5, n2[1:5] - 3))) %>%
mutate(id = row_number(),
sub = rep(c(TRUE, FALSE), c(15, 34-15)))
dmu284 <- mu284 %>%
as_survey_design(ids = c(id1, id2), fpc = c(n1, n2))
# first phase cluster sample, second phase stratified within cluster
d2mu284 <- mu284_1 %>%
as_survey_twophase(id = list(id1, id), strata = list(NULL, id1),
fpc = list(n1, NULL), subset = sub)
dmu284 %>%
summarize(total = survey_total(y1),
mean = survey_mean(y1))
#> # A tibble: 1 × 4
#> total total_se mean mean_se
#> <dbl> <dbl> <dbl> <dbl>
#> 1 15080 2274. 44.4 2.27
d2mu284 %>%
summarize(total = survey_total(y1),
mean = survey_mean(y1))
#> # A tibble: 1 × 4
#> total total_se mean mean_se
#> <dbl> <dbl> <dbl> <dbl>
#> 1 15080 2274. 44.4 2.27
# dplyr 0.7 introduced new style of NSE called quosures
# See `vignette("programming", package = "dplyr")` for details
ids <- quo(list(id, id))
d2pbc <- pbc %>%
as_survey_twophase(id = !!ids, subset = "randomized")