Skip to contents

Summarise multiple values to a single value.

Arguments

.data

tbl A tbl_svy object

...

Name-value pairs of summarizing expressions, see details

.groups

Defaults to "drop_last" in srvyr meaning that the last group is peeled off, but if there are more groups they will be preserved. Other options are "drop", which drops all groups, "keep" which keeps all of them and "rowwise" which converts the object to a rowwise object (meaning calculations will be performed on each row).

.unpack

Whether to "unpack" named data.frame columns. srvyr predates dplyr's support for data.frame columns so it does not treat them the same way by default.

Details

Summarise for tbl_svy objects accepts several specialized functions. Each of the functions a variable (or two, in the case of survey_ratio), from the data.frame and default to providing the measure and its standard error.

The argument vartype can choose one or more measures of uncertainty, se for standard error, ci for confidence interval, var for variance, and cv for coefficient of variation. level specifies the level for the confidence interval.

The other arguments correspond to the analogous function arguments from the survey package.

The available functions from srvyr are:

survey_mean

Calculate the mean of a numeric variable or the proportion falling into groups for the entire population or by groups. Based on svymean and svyciprop.

.
survey_total

Calculate the survey total of the entire population or by groups. Based on svytotal.

survey_prop

Calculate the proportion of the entire population or by groups. Based on svyciprop.

survey_ratio

Calculate the ratio of 2 variables in the entire population or by groups. Based on svyratio.

survey_quantile & survey_median

Calculate quantiles in the entire population or by groups. Based on svyquantile.

unweighted

Calculate an unweighted estimate as you would on a regular tbl_df. Based on dplyr's summarise.

You can use expressions both in the ... of summarize and also in the arguments to the summarizing functions. Though this is valid syntactically it can also allow you to calculate incorrect results (for example if you multiply the mean by 100, the standard error is also multiplied by 100, but the variance is not).

Examples

data(api, package = "survey")

dstrata <- apistrat %>%
  as_survey_design(strata = stype, weights = pw)

dstrata %>%
  summarise(api99_mn = survey_mean(api99),
            api00_mn = survey_mean(api00),
            api_diff = survey_mean(api00 - api99))
#> # A tibble: 1 × 6
#>   api99_mn api99_mn_se api00_mn api00_mn_se api_diff api_diff_se
#>      <dbl>       <dbl>    <dbl>       <dbl>    <dbl>       <dbl>
#> 1     629.        10.1     662.        9.54     32.9        2.08

dstrata_grp <- dstrata %>%
  group_by(stype)

dstrata_grp %>%
  summarise(api99_mn = survey_mean(api99),
            api00_mn = survey_mean(api00),
            api_diff = survey_mean(api00 - api99))
#> # A tibble: 3 × 7
#>   stype api99_mn api99_mn_se api00_mn api00_mn_se api_diff api_diff_se
#>   <fct>    <dbl>       <dbl>    <dbl>       <dbl>    <dbl>       <dbl>
#> 1 E         636.        13.3     674.        12.5    38.6         2.76
#> 2 H         617.        15.8     626.        15.5     8.46        3.41
#> 3 M         610.        16.8     637.        16.6    26.4         3.05

# `dplyr::across` can be used to programmatically summarize multiple columns
# See https://dplyr.tidyverse.org/articles/colwise.html for details
# A basic example of working on 2 columns at once and then calculating the total
# the mean
total_vars <- c("enroll", "api.stu")
dstrata %>%
  summarize(across(c(all_of(total_vars)), survey_total))
#> # A tibble: 1 × 4
#>     enroll enroll_se  api.stu api.stu_se
#>      <dbl>     <dbl>    <dbl>      <dbl>
#> 1 3687178.   117319. 3086009.    101841.

# Expressions are allowed in summarize arguments & inside functions
# Here we can calculate binary variable on the fly and also multiply by 100 to
# get percentages
dstrata %>%
  summarize(api99_over_700_pct = 100 * survey_mean(api99 > 700))
#> # A tibble: 1 × 2
#>   api99_over_700_pct api99_over_700_pct_se
#>                <dbl>                 <dbl>
#> 1               30.6                  3.61

# But be careful, the variance doesn't scale the same way, so this is wrong!
dstrata %>%
  summarize(api99_over_700_pct = 100 * survey_mean(api99 > 700, vartype = "var"))
#> # A tibble: 1 × 2
#>   api99_over_700_pct api99_over_700_pct_var
#>                <dbl>                  <dbl>
#> 1               30.6                  0.130
# Wrong variance!