Simulation Study – Coverage Probability of 95% Confidence Intervals of the Horvitz-Thompson Estimator

2 minute read

This short simulation study examines the coverage probability of 95% confidence intervals of the Horvitz-Thompson estimator of the population mean under simple random sampling without replacement.

95% Confidence Intervals

The Horvitz-Thompson estimator of the population mean:

\[\begin{align*} \bar{y} = \frac{1}{n}\sum_{k \in S} y_k \end{align*}\]

where \(S\) is the set of samples and \(n\) is the sample size.

A corresponding 95% confidence interval:

\[\begin{align*} \left( \bar{y} \pm 1.96\sqrt{\hat{Var}[\bar{y}]} \right) \end{align*}\]

where the variance estimator is

\[\begin{align*} \hat{Var}[\bar{y}] = \left(1 - \frac{n}{N} \right) \frac{s_y^2}{n} \end{align*}\]

and \(s_y^2\) is the sample variance.

Simulation

I start with generating the following populations of survey variable \(y\). Each population is of size N = 10000.

N(10, 25)
Weibull(shape = 0.5, scale = 4)

The simulation contains R = 50000 replication.

simulation_function.R contains a function to execute this simulation.

source("simulation_function.R")

Set the population size.

N = 10000

Simulation: N(10, 25)

set.seed(123)
y.population.1 = rnorm(N, mean = 10, sd = 5)
output.1 = simulation(y.population.1, R = 50000)
output.1

	sample.size	Rel.Bias.Point.Est	Var.Point.Est	Rel.Bias.Var.Est	Coverage.Prob
1	n=10	0.556	25.574	0.27	0.922
2	n=25	0.116	10.176	-0.104	0.941
3	n=50	0.104	5.067	-0.264	0.947
4	n=100	-0.016	2.524	-0.121	0.949
5	n=500	0.031	0.485	0.042	0.951

Simulation: Weibull(shape = 0.5, scale = 1)

set.seed(123)
y.population.2 = rweibull(N, shape = 0.5, scale = 4)
output.2 = simulation(y.population.2, R = 50000)
output.2

	sample.size	Rel.Bias.Point.Est	Var.Point.Est	Rel.Bias.Var.Est	Coverage.Prob
1	n=10	0.704	31.57	1.839	0.733
2	n=25	-0.693	12.117	-2.132	0.818
3	n=50	-0.104	6.151	-0.395	0.867
4	n=100	-0.095	3.060	-0.402	0.901
5	n=500	0.095	0.591	0.305	0.938

Observations

As sample size \(n\) increases, in general, the magnitudes of the relative biases (the second and the fourth columns) of both point estimators \(\bar{y}\) and \(\hat{Var}[\bar{y}]\) decrease and approach 0. This phenomenon is consistent with law of large numbers. The magnitudes of biases of both estimators shrinks towards zero as we collect more samples.
The point estimates from \(\hat{Var}[\bar{y}]\) decrease as sample size \(n\) increases, reflecting the reduction in the typical error of \(\bar{y}\) as we collect more samples. In other words, \(\bar{y}\) is more accurate with more samples (note that \(\bar{y}\) is unbiased, so variance is equal to MSE). This observation is consistent with the above point.
The coverage probability of the two-sided confidence interval of \(\bar{Y}\) increases and approaches 0.95 as \(n\) increases. That is, as we collect more samples, in the simulation, the proportion of the confidence intervals computed containing the population mean increases with the proportion approaching 95%. This observation is expected because of a central limit theorem and 1.96 being the 97.5% quantile of the standard normal distribution.
Note that as \(n\) increases, the width of the confidence interval decreases because of decreasing \(\hat{Var}[\bar{y}]\). Despite this, the coverage probability increases and approaches 95%. This phenomenon is consistent with the decreasing relative bias of \(\bar{y}\) and \(\bar{y}\) getting closer to the population mean.
When \(n = 10\), the coverage probability corresponding to the normal population is much higher than that to the Weibull population (0.92 vs. 0.81), because the Weibull population is heavily-skewed while the normal population is much closer to the standard normal.
As \(n\) increases, a central limit theorem starts to apply and hence coverage probabilities of both population approaches 0.95. Note that when \(n = 50\), the coverage probability corresponding to the normal population is very close to 0.95 (0.947) while that to the Weibull population is still 0.867. When \(n=500\), the coverage probability corresponding to the normal population reaches 0.95 while that to the Weibull is still 0.94. Therefore, it takes much more samples for the Weibull distribution to achieve 0.95 through a central limit theorem.

Share on

Twitter Facebook LinkedIn

John Tsang

Simulation Study – Coverage Probability of 95% Confidence Intervals of the Horvitz-Thompson Estimator

95% Confidence Intervals

Simulation

Simulation: N(10, 25)

Simulation: Weibull(shape = 0.5, scale = 1)

Observations

Share on

You may also enjoy

An Example to Accelerate `R` with `Rcpp` and `OpenMP`: a Simulation Study of Survey with Systematic Non-responses

Accelerate `R` with `Rcpp`: Sampling with Replacement and the Hansen-Hurwitz Estimator

The R6 Class in `R`

Comparing Exchange Rate Prediction Models

John Tsang

95% Confidence Intervals

Simulation

Simulation: N(10, 25)

Simulation: Weibull(shape = 0.5, scale = 1)

Observations

Share on

You may also enjoy

An Example to Accelerate R with Rcpp and OpenMP: a Simulation Study of Survey with Systematic Non-responses

Accelerate R with Rcpp: Sampling with Replacement and the Hansen-Hurwitz Estimator

The R6 Class in R

Comparing Exchange Rate Prediction Models

An Example to Accelerate `R` with `Rcpp` and `OpenMP`: a Simulation Study of Survey with Systematic Non-responses

Accelerate `R` with `Rcpp`: Sampling with Replacement and the Hansen-Hurwitz Estimator

The R6 Class in `R`