--- title: "Introduction to adestr" author: "Jan Meis" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction to adestr} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` # Introduction This package implements methods to evaluate the performance characteristics of various point and interval estimators for adaptive two-stage designs with prespecified sample-size recalculation rules. Further, it allows for evaluation of these estimators on real datasets, and it implements methods to calculate p-values. Currently, it works for designs objects which were produced by the R-package `adoptr`, which calculates optimal design parameters for adaptive two-stage designs. You can learn about adoptr here: [optad.github.io/adoptr/](https://optad.github.io/adoptr/). An introductory vignette covering common usecases is given at [https://jan-imbi.github.io/adestr/articles/Introduction.html](https://jan-imbi.github.io/adestr/articles/Introduction.html). This package comes suite of unit tests. The code of the test cases can be viewed here: [https://github.com/jan-imbi/adestr/tree/master/tests/testthat](https://github.com/jan-imbi/adestr/tree/master/tests/testthat). The authors assume no responsibility for the correctness of the code or results produced by its usage. Use at your own risk. You may also be interested in the reference implementation looking at the [https://github.com/jan-imbi/adestr/blob/master/R/reference_implementation.R](https://github.com/jan-imbi/adestr/blob/master/R/reference_implementation.R). It uses the same notation as in our paper ([doi.org/10.1002/sim.10020](https://doi.org/10.1002/sim.10020)) and may therefore be easier to understand at first. # Fitting a design with adoptr In order to showcase the capabilities of this package, we need a trial design first. We refer to [the example from the adoptr documentation](https://optad.github.io/adoptr/articles/adoptr.html) for this. You can read more about optimal adaptive designs fitted via the adoptr package here: [optad.github.io/adoptr/articles/adoptr_jss.html](https://optad.github.io/adoptr/articles/adoptr_jss.html). For the sake of this introduction, a pre-computed version of the first example from [optad.github.io/adoptr/articles/adoptr.html](https://optad.github.io/adoptr/articles/adoptr.html) is provided with this package via the `get_example_design` function. ```{r} library(adestr) get_example_design(two_armed = TRUE) ``` # Evaluating point estimators ## Evaluating the mean squared of the sample mean and the weighted sample mean Now that we have created an optimal adaptive design, we can investigate the performance characteristics of various estimators for the mean in that design. To this end, the `evaluate_estimator` function can be used. ```{r} evaluate_estimator( score = MSE(), estimator = SampleMean(), data_distribution = Normal(two_armed = TRUE), design = get_example_design(two_armed = TRUE), mu = 0.3, sigma = 1 ) ``` The mean squared error of the sample mean depends on the true underlying value of the paramter $\mu$, which of course is unknown. Therefore, it may be advisable to use the `evaluate_estimator` function on an array of values for $\mu$ to investigate the distributional properties of an estimator. In the following, the MSE of the sample mean vs. a weighted sample mean with fixed weights will be plotted. ```{r, fig.width=7.2, fig.height=4, dev="svg"} mse_mle <- evaluate_estimator( score = MSE(), estimator = SampleMean(), data_distribution = Normal(two_armed = TRUE), design = get_example_design(two_armed = TRUE), mu = seq(-0.75, 1.32, .03), sigma = 1 ) mse_weighted_sample_means <- evaluate_estimator( score = MSE(), estimator = WeightedSampleMean(w1 = .8), data_distribution = Normal(two_armed = TRUE), design = get_example_design(two_armed = TRUE), mu = seq(-0.75, 1.32, .03), sigma = 1 ) plot(c(mse_mle, mse_weighted_sample_means)) ``` ## Evaluating median unbiased estimators Median unbiased estimators are estimators where the probability of overestimation is equal to the probability of underestimation. They can be derived by choosing a sample space ordering. In `adestr`, 5 different sample-space orderings are implemented: The MLE (maximum likelihood estimator) ordering, the LR (likelihood ratio) test ordering, the ST (score test) ordering, the SWCF (stage-wise combination function) ordering, and the NP (Neyman-Pearson) ordering. The latter (NP ordering) is only useful for the calculation of p-values, estimators derived from that ordering are usually nonsense. Lets look at the 'median-unbiased' property for estimators derived from the four orderings: ```{r} evaluate_estimator( score = OverestimationProbability(), estimator = MedianUnbiasedMLEOrdering(), data_distribution = Normal(two_armed = TRUE), design = get_example_design(two_armed = TRUE), mu = 0.4, sigma = 1 ) evaluate_estimator( score = OverestimationProbability(), estimator = MedianUnbiasedLikelihoodRatioOrdering(), data_distribution = Normal(two_armed = TRUE), design = get_example_design(two_armed = TRUE), mu = 0.4, sigma = 1 ) evaluate_estimator( score = OverestimationProbability(), estimator = MedianUnbiasedScoreTestOrdering(), data_distribution = Normal(two_armed = TRUE), design = get_example_design(two_armed = TRUE), mu = 0.4, sigma = 1 ) evaluate_estimator( score = OverestimationProbability(), estimator = MedianUnbiasedStagewiseCombinationFunctionOrdering(), data_distribution = Normal(two_armed = TRUE), design = get_example_design(two_armed = TRUE), mu = 0.4, sigma = 1 ) ``` Compare this to the Overestimation probability of the sample mean: ```{r} evaluate_estimator( score = OverestimationProbability(), estimator = SampleMean(), data_distribution = Normal(two_armed = TRUE), design = get_example_design(two_armed = TRUE), mu = 0.4, sigma = 1 ) ``` ## Evaluating bias-reduced unbiased estimators Apart from sample-space ordering based methods, there are various other ways of defining alternative point estimators which may have desirable properties. Here are a couple that are presented in [our paper](https://doi.org/10.1002/sim.10020) evaluated for mu=0.3 and sigma=1. ```{r} evaluate_estimator( score = MSE(), estimator = SampleMean(), data_distribution = Normal(two_armed = TRUE), design = get_example_design(two_armed = TRUE), mu = 0.3, sigma = 1 ) evaluate_estimator( score = MSE(), estimator = BiasReduced(), data_distribution = Normal(two_armed = TRUE), design = get_example_design(two_armed = TRUE), mu = 0.3, sigma = 1 ) evaluate_estimator( score = MSE(), estimator = PseudoRaoBlackwell(), data_distribution = Normal(two_armed = TRUE), design = get_example_design(two_armed = TRUE), mu = 0.3, sigma = 1 ) evaluate_estimator( score = MSE(), estimator = RaoBlackwell(), data_distribution = Normal(two_armed = TRUE), design = get_example_design(two_armed = TRUE), mu = 0.3, sigma = 1 ) evaluate_estimator( score = MSE(), estimator = AdaptivelyWeightedSampleMean(), data_distribution = Normal(two_armed = TRUE), design = get_example_design(two_armed = TRUE), mu = 0.3, sigma = 1 ) ``` # Evaluating interval estimators ## Evaluating coverage The coverage of an interval estimator is the probability of that an estimator will cover the true value of the parameter in question. It may be evaluated like this: ```{r} evaluate_estimator( score = Coverage(), estimator = NaiveCI(), data_distribution = Normal(two_armed = TRUE), design = get_example_design(two_armed = TRUE), mu = .07, sigma = 1 ) ``` As you can see, the naive confidence interval does not have the correct coverage of 95%. Here is an example for an estimator that achieves the correct coverage: ```{r} evaluate_estimator( score = Coverage(), estimator = LikelihoodRatioOrderingCI(), data_distribution = Normal(two_armed = TRUE), design = get_example_design(two_armed = TRUE), mu = .07, sigma = 1 ) ``` For the parameter of mu that we chose in this example, the naive confidence interval performs particularly bad. We can plot the coverage of the naive confidence interval for an array of values like this: ```{r, fig.width=7.2, fig.height=4, dev="svg"} coverage_naive <- evaluate_estimator( score = Coverage(), estimator = NaiveCI(), data_distribution = Normal(two_armed = TRUE), design = get_example_design(two_armed = TRUE), mu = seq(-0.75, 1.32, .03), sigma = 1 ) plot(coverage_naive) ``` ## Evaluating the widht of an interval estimator Amongst the interval estimators which have the correct coverage, one might be interested in selecting the one with the least width. We can evaluate the expected width of an interval estimator for a particular set of assumptions like this: ```{r} evaluate_estimator( score = Width(), estimator = MLEOrderingCI(), data_distribution = Normal(two_armed = TRUE), design = get_example_design(two_armed = TRUE), mu = .3, sigma = 1 ) evaluate_estimator( score = Width(), estimator = LikelihoodRatioOrderingCI(), data_distribution = Normal(two_armed = TRUE), design = get_example_design(two_armed = TRUE), mu = .3, sigma = 1 ) evaluate_estimator( score = Width(), estimator = ScoreTestOrderingCI(), data_distribution = Normal(two_armed = TRUE), design = get_example_design(two_armed = TRUE), mu = .3, sigma = 1 ) evaluate_estimator( score = Width(), estimator = StagewiseCombinationFunctionOrderingCI(), data_distribution = Normal(two_armed = TRUE), design = get_example_design(two_armed = TRUE), mu = .3, sigma = 1 ) ``` ## Evaluating the centrality of a point estimator with respect to an interval estimator When choosing a combination of point and interval estimator to report at the end of a trial, one might want the point estimator to be more or less in the middle of the respective interval. To evaluate the 'centrality' of an estimator, which in this case is defined as the difference between the distance of the point estimator the lower end of an interval and the upper end of an interval, one can use the following command: ```{r} evaluate_estimator( score = Centrality(interval = StagewiseCombinationFunctionOrderingCI()), estimator = SampleMean(), data_distribution = Normal(two_armed = TRUE), design = get_example_design(two_armed = TRUE), mu = .3, sigma = 1 ) evaluate_estimator( score = Centrality(interval = NaiveCI()), estimator = SampleMean(), data_distribution = Normal(two_armed = TRUE), design = get_example_design(two_armed = TRUE), mu = .3, sigma = 1 ) ``` ## Evaluating agreement with the primary test decision of the design for an interval estimator In the framework of optimal adaptive designs, the rejection boundaries for the primary testing decision of a design are optimized directly and are not derived from a common test statistic. Therefore, confidence intervals derived from such statistics do not necessarily agree with the primary test decision. One may evaluate the chance of agreement like this: ```{r} evaluate_estimator( score = TestAgreement(), estimator = MLEOrderingCI(), data_distribution = Normal(two_armed = TRUE), design = get_example_design(two_armed = TRUE), mu = .3, sigma = 1 ) evaluate_estimator( score = TestAgreement(), estimator = LikelihoodRatioOrderingCI(), data_distribution = Normal(two_armed = TRUE), design = get_example_design(two_armed = TRUE), mu = .3, sigma = 1 ) evaluate_estimator( score = TestAgreement(), estimator = ScoreTestOrderingCI(), data_distribution = Normal(two_armed = TRUE), design = get_example_design(two_armed = TRUE), mu = .3, sigma = 1 ) evaluate_estimator( score = TestAgreement(), estimator = StagewiseCombinationFunctionOrderingCI(), data_distribution = Normal(two_armed = TRUE), design = get_example_design(two_armed = TRUE), mu = .3, sigma = 1 ) ``` The confidence interval derived from the stage-wise combination function ordering (by its construction) always agrees with the primary testing decision. # Evaluating p-values ## Evaluating agreement with the primary test decision of the design for a p-value Like confidence intervals, p-values always come with an associated test decision. However, for the same reason as described in the chapter about interval estimators, these test decision derived from various ways of calculating p-values may not necessarily agree with the primary test decision of an optimal adaptive design. One may evaluate the probability of agreement for a particular set of assumptions like this: ```{r} evaluate_estimator( score = TestAgreement(), estimator = MLEOrderingPValue(), data_distribution = Normal(two_armed = TRUE), design = get_example_design(two_armed = TRUE), mu = .3, sigma = 1 ) evaluate_estimator( score = TestAgreement(), estimator = LikelihoodRatioOrderingPValue(), data_distribution = Normal(two_armed = TRUE), design = get_example_design(two_armed = TRUE), mu = .3, sigma = 1 ) evaluate_estimator( score = TestAgreement(), estimator = ScoreTestOrderingPValue(), data_distribution = Normal(two_armed = TRUE), design = get_example_design(two_armed = TRUE), mu = .3, sigma = 1 ) evaluate_estimator( score = TestAgreement(), estimator = StagewiseCombinationFunctionOrderingPValue(), data_distribution = Normal(two_armed = TRUE), design = get_example_design(two_armed = TRUE), mu = .3, sigma = 1 ) ``` Again, we see that only the p-value derived from the stage-wise combination function ordering always agrees with the primary testing decision. # Conducting larger scale investigations in the performance characteristics of end-of-trial statistics So far we have only looked at evaluating performance characteristics for a particular set of assumptions, a choice of performance characteristics and a choice of a few estimators. However, when designing a trial, one might want to produce results for a variety of different scenarios, which can be time consuming. The evaluation of performance characteristics in `adestr` can run in parallel using the `future` framework. ```{r} library(future.apply) # Change to e.g. plan(multisession) for parallel computing plan(sequential) # Scenario 1: scores1 <- list(MSE(), OverestimationProbability()) estimators1 <- list(SampleMean(), BiasReduced()) dist1 <- list(Normal(two_armed = TRUE)) designs1 <- list(get_example_design(two_armed = TRUE)) mu1 <- seq(-1,1,.5) sigma1 <- 1 # Scenario 2: scores2 <- list(Coverage(), Width()) estimators2 <- list(NaiveCI(), StagewiseCombinationFunctionOrderingCI()) dist2 <- list(Normal(two_armed = TRUE)) designs2 <- list(get_example_design(two_armed = TRUE)) mu2 <- seq(-1,1,.5) sigma2 <- 1 # Evaluate in parallel res <- evaluate_scenarios_parallel( score_lists = list(scores1, scores2), estimator_lists = list(estimators1, estimators2), data_distribution_lists = list(dist1, dist2), design_lists = list(designs1, designs2), mu_lists = list(mu1, mu2), sigma_lists = list(sigma1, sigma2) ) res[[1]] res[[2]] ``` You will get back a list of data.frames containing the results for each scenario. Note that within one scenario, the evaluation will take place for the cross-product of all arguments that are supplied! This means that within one scenario, every estimator is evaluated with every score at every point of mu for every sigma and every design. Depending on your settings, this can get very time-consuming very fast, making parallelization essential. # Analyzing datasets Next, let us look at how to the package can be used to calculate estimates after data has been collected. The first stage data of a trial might look like this: ```{r} set.seed(321) dat <- data.frame( endpoint = c(rnorm(56, .3, 1), rnorm(56, 0, 1)), group = factor(rep(c("trt", "ctl"), c(56,56)), levels = c("trt", "ctl")), stage = rep(1, 112) ) head(dat) ``` ```{r} analyze(data = dat, statistics = list(), data_distribution = Normal(two_armed = TRUE), design = get_example_design(two_armed = TRUE), sigma = 1) ``` The results suggest recruiting 23 more patients per group for the second stage. We will simulate 47 more patients per group: ```{r} dat <- rbind(dat, data.frame( endpoint = c(rnorm(47, .3, 1), rnorm(47, 0, 1)), group = factor(rep(c("trt", "ctl"), c(47, 47)), levels = c("trt", "ctl")), stage = rep(2, 94) )) ``` Now, we can use the `analyze` function to evaluate a selection of estimators on the complete dataset: ```{r} analyze( data = dat, statistics = c( SampleMean(), BiasReduced(), PseudoRaoBlackwell(), MedianUnbiasedStagewiseCombinationFunctionOrdering(), StagewiseCombinationFunctionOrderingCI(), StagewiseCombinationFunctionOrderingPValue() ), data_distribution = Normal(two_armed = TRUE), sigma = 1, design = get_example_design(two_armed = TRUE) ) ``` The estimates presented here are for the difference in means of the two normal distributions. Keep in mind that a difference of $\mu=0.3$ was used in the simulation. Note that while the median unbiased estimator performs well in this particular example, this is not universally true.