---
title: "Introduction to adestr"
author: "Jan Meis"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Introduction to adestr}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

# Introduction

This package implements methods to evaluate the performance characteristics
of various point and interval estimators for adaptive two-stage designs with 
prespecified sample-size recalculation rules. Further, it allows for
evaluation of these estimators on real datasets, and it implements methods
to calculate p-values.

Currently, it works for designs objects which were produced by the
R-package `adoptr`, which calculates optimal design parameters for
adaptive two-stage designs. You can learn about adoptr here:
[optad.github.io/adoptr/](https://optad.github.io/adoptr/).

An introductory vignette covering common usecases is given at [https://jan-imbi.github.io/adestr/articles/Introduction.html](https://jan-imbi.github.io/adestr/articles/Introduction.html).

This package comes suite of unit tests. The code of the test cases can be viewed here: [https://github.com/jan-imbi/adestr/tree/master/tests/testthat](https://github.com/jan-imbi/adestr/tree/master/tests/testthat). The authors assume no responsibility for the correctness of the
code or results produced by its usage. Use at your own risk.

You may also be interested in the reference implementation looking at the [https://github.com/jan-imbi/adestr/blob/master/R/reference_implementation.R](https://github.com/jan-imbi/adestr/blob/master/R/reference_implementation.R).
It uses the same notation as in our paper ([doi.org/10.1002/sim.10020](https://doi.org/10.1002/sim.10020)) and may therefore be
easier to understand at first.

# Fitting a design with adoptr
In order to showcase the capabilities of this package, we need a trial design first.
We refer to [the example from the adoptr documentation](https://optad.github.io/adoptr/articles/adoptr.html)
for this.
You can read more about optimal adaptive designs fitted via the adoptr package here:
[optad.github.io/adoptr/articles/adoptr_jss.html](https://optad.github.io/adoptr/articles/adoptr_jss.html).

For the sake of this introduction, a pre-computed version of the first example from
[optad.github.io/adoptr/articles/adoptr.html](https://optad.github.io/adoptr/articles/adoptr.html)
is provided with this package via the `get_example_design` function.

```{r}
library(adestr)
get_example_design(two_armed = TRUE)
```

# Evaluating point estimators
## Evaluating the mean squared of the sample mean and the weighted sample mean
Now that we have created an optimal adaptive design, we can investigate the performance
characteristics of various estimators for the mean in that design.
To this end, the `evaluate_estimator` function can be used.

```{r}
evaluate_estimator(
 score = MSE(),
 estimator = SampleMean(),
 data_distribution = Normal(two_armed = TRUE),
 design = get_example_design(two_armed = TRUE),
 mu = 0.3,
 sigma = 1
)
```

The mean squared error of the sample mean depends on the true underlying value of the
paramter $\mu$, which of course is unknown. Therefore, it may be
advisable to use the `evaluate_estimator` function on an array of values for $\mu$
to investigate the distributional properties of an estimator.

In the following, the MSE of the sample mean vs. a weighted sample mean with fixed weights
will be plotted.

```{r, fig.width=7.2, fig.height=4, dev="svg"}
mse_mle <- evaluate_estimator(
  score = MSE(),
  estimator = SampleMean(),
  data_distribution = Normal(two_armed = TRUE),
  design = get_example_design(two_armed = TRUE),
  mu = seq(-0.75, 1.32, .03),
  sigma = 1
)
mse_weighted_sample_means <- evaluate_estimator(
  score = MSE(),
  estimator = WeightedSampleMean(w1 = .8),
  data_distribution = Normal(two_armed = TRUE),
  design = get_example_design(two_armed = TRUE),
  mu = seq(-0.75, 1.32, .03),
  sigma = 1
)
plot(c(mse_mle, mse_weighted_sample_means))
```


## Evaluating median unbiased estimators
Median unbiased estimators are estimators where the probability of overestimation
is equal to the probability of underestimation. They can be derived by choosing
a sample space ordering. In `adestr`, 5 different sample-space orderings are
implemented: The MLE (maximum likelihood estimator) ordering,
the LR (likelihood ratio) test ordering, the ST (score test) ordering,
the SWCF (stage-wise combination function) ordering, and the NP (Neyman-Pearson)
ordering. The latter (NP ordering) is only useful for the calculation of p-values,
estimators derived from that ordering are usually nonsense.

Lets look at the 'median-unbiased' property for estimators derived from the four
orderings:

```{r}
evaluate_estimator(
 score = OverestimationProbability(),
 estimator = MedianUnbiasedMLEOrdering(),
 data_distribution = Normal(two_armed = TRUE),
 design = get_example_design(two_armed = TRUE),
 mu = 0.4,
 sigma = 1
)

evaluate_estimator(
 score = OverestimationProbability(),
 estimator = MedianUnbiasedLikelihoodRatioOrdering(),
 data_distribution = Normal(two_armed = TRUE),
 design = get_example_design(two_armed = TRUE),
 mu = 0.4,
 sigma = 1
)

evaluate_estimator(
 score = OverestimationProbability(),
 estimator = MedianUnbiasedScoreTestOrdering(),
 data_distribution = Normal(two_armed = TRUE),
 design = get_example_design(two_armed = TRUE),
 mu = 0.4,
 sigma = 1
)

evaluate_estimator(
 score = OverestimationProbability(),
 estimator = MedianUnbiasedStagewiseCombinationFunctionOrdering(),
 data_distribution = Normal(two_armed = TRUE),
 design = get_example_design(two_armed = TRUE),
 mu = 0.4,
 sigma = 1
)
```


Compare this to the Overestimation probability of the sample mean:

```{r}
evaluate_estimator(
 score = OverestimationProbability(),
 estimator = SampleMean(),
 data_distribution = Normal(two_armed = TRUE),
 design = get_example_design(two_armed = TRUE),
 mu = 0.4,
 sigma = 1
)
```

## Evaluating bias-reduced unbiased estimators
Apart from sample-space ordering based methods, there are various other ways of
defining alternative point estimators which may have desirable properties.
Here are a couple that are presented in [our paper](https://doi.org/10.1002/sim.10020)
evaluated for mu=0.3 and sigma=1.

```{r}
evaluate_estimator(
 score = MSE(),
 estimator = SampleMean(),
 data_distribution = Normal(two_armed = TRUE),
 design = get_example_design(two_armed = TRUE),
 mu = 0.3,
 sigma = 1
)
evaluate_estimator(
 score = MSE(),
 estimator = BiasReduced(),
 data_distribution = Normal(two_armed = TRUE),
 design = get_example_design(two_armed = TRUE),
 mu = 0.3,
 sigma = 1
)
evaluate_estimator(
 score = MSE(),
 estimator = PseudoRaoBlackwell(),
 data_distribution = Normal(two_armed = TRUE),
 design = get_example_design(two_armed = TRUE),
 mu = 0.3,
 sigma = 1
)
evaluate_estimator(
 score = MSE(),
 estimator = RaoBlackwell(),
 data_distribution = Normal(two_armed = TRUE),
 design = get_example_design(two_armed = TRUE),
 mu = 0.3,
 sigma = 1
)
evaluate_estimator(
 score = MSE(),
 estimator = AdaptivelyWeightedSampleMean(),
 data_distribution = Normal(two_armed = TRUE),
 design = get_example_design(two_armed = TRUE),
 mu = 0.3,
 sigma = 1
)
```


# Evaluating interval estimators
## Evaluating coverage
The coverage of an interval estimator is the probability of that an estimator will
cover the true value of the parameter in question. It may be evaluated like this:

```{r}
evaluate_estimator(
 score = Coverage(),
 estimator = NaiveCI(),
 data_distribution = Normal(two_armed = TRUE),
 design = get_example_design(two_armed = TRUE),
 mu = .07,
 sigma = 1
)
```

As you can see, the naive confidence interval does not have the correct coverage of
95%. Here is an example for an estimator that achieves the correct coverage: 

```{r}
evaluate_estimator(
 score = Coverage(),
 estimator = LikelihoodRatioOrderingCI(),
 data_distribution = Normal(two_armed = TRUE),
 design = get_example_design(two_armed = TRUE),
 mu = .07,
 sigma = 1
)
```

For the parameter of mu that we chose in this example, the naive confidence interval
performs particularly bad. We can plot the coverage of the naive confidence interval
for an array of values like this:

```{r, fig.width=7.2, fig.height=4, dev="svg"}
coverage_naive <- evaluate_estimator(
 score = Coverage(),
 estimator = NaiveCI(),
 data_distribution = Normal(two_armed = TRUE),
 design = get_example_design(two_armed = TRUE),
 mu = seq(-0.75, 1.32, .03),
 sigma = 1
)
plot(coverage_naive)
```

## Evaluating the widht of an interval estimator
Amongst the interval estimators which have the correct coverage, one might be
interested in selecting the one with the least width. We can evaluate the
expected width of an interval estimator for a particular set of assumptions like this:

```{r}
evaluate_estimator(
 score = Width(),
 estimator = MLEOrderingCI(),
 data_distribution = Normal(two_armed = TRUE),
 design = get_example_design(two_armed = TRUE),
 mu = .3,
 sigma = 1
)
evaluate_estimator(
 score = Width(),
 estimator = LikelihoodRatioOrderingCI(),
 data_distribution = Normal(two_armed = TRUE),
 design = get_example_design(two_armed = TRUE),
 mu = .3,
 sigma = 1
)
evaluate_estimator(
 score = Width(),
 estimator = ScoreTestOrderingCI(),
 data_distribution = Normal(two_armed = TRUE),
 design = get_example_design(two_armed = TRUE),
 mu = .3,
 sigma = 1
)
evaluate_estimator(
 score = Width(),
 estimator = StagewiseCombinationFunctionOrderingCI(),
 data_distribution = Normal(two_armed = TRUE),
 design = get_example_design(two_armed = TRUE),
 mu = .3,
 sigma = 1
)
```

## Evaluating the centrality of a point estimator with respect to an interval estimator
When choosing a combination of point and interval estimator to report at the end of a
trial, one might want the point estimator to be more or less in the middle of the
respective interval. To evaluate the 'centrality' of an estimator, which in this case
is defined as the difference between the distance of the point estimator the lower end
of an interval and the upper end of an interval, one can use the following command:

```{r}
evaluate_estimator(
 score = Centrality(interval = StagewiseCombinationFunctionOrderingCI()),
 estimator = SampleMean(),
 data_distribution = Normal(two_armed = TRUE),
 design = get_example_design(two_armed = TRUE),
 mu = .3,
 sigma = 1
)
evaluate_estimator(
 score = Centrality(interval = NaiveCI()),
 estimator = SampleMean(),
 data_distribution = Normal(two_armed = TRUE),
 design = get_example_design(two_armed = TRUE),
 mu = .3,
 sigma = 1
)
```


## Evaluating agreement with the primary test decision of the design for an interval estimator
In the framework of optimal adaptive designs, the rejection boundaries for the primary
testing decision of a design are optimized directly and are not derived from a common
test statistic. Therefore, confidence intervals derived from such statistics do not
necessarily agree with the primary test decision. One may evaluate the chance of
agreement like this:

```{r}
evaluate_estimator(
 score = TestAgreement(),
 estimator = MLEOrderingCI(),
 data_distribution = Normal(two_armed = TRUE),
 design = get_example_design(two_armed = TRUE),
 mu = .3,
 sigma = 1
)
evaluate_estimator(
 score = TestAgreement(),
 estimator = LikelihoodRatioOrderingCI(),
 data_distribution = Normal(two_armed = TRUE),
 design = get_example_design(two_armed = TRUE),
 mu = .3,
 sigma = 1
)
evaluate_estimator(
 score = TestAgreement(),
 estimator = ScoreTestOrderingCI(),
 data_distribution = Normal(two_armed = TRUE),
 design = get_example_design(two_armed = TRUE),
 mu = .3,
 sigma = 1
)
evaluate_estimator(
 score = TestAgreement(),
 estimator = StagewiseCombinationFunctionOrderingCI(),
 data_distribution = Normal(two_armed = TRUE),
 design = get_example_design(two_armed = TRUE),
 mu = .3,
 sigma = 1
)
```

The confidence interval derived from the stage-wise combination function ordering
(by its construction) always agrees with the primary testing decision.

# Evaluating p-values
## Evaluating agreement with the primary test decision of the design for a p-value
Like confidence intervals, p-values always come with an associated test decision.
However, for the same reason as described in the chapter about interval estimators,
these test decision derived from various ways of calculating p-values may not necessarily
agree with the primary test decision of an optimal adaptive design. One may evaluate
the probability of agreement for a particular set of assumptions like this:

```{r}
evaluate_estimator(
 score = TestAgreement(),
 estimator = MLEOrderingPValue(),
 data_distribution = Normal(two_armed = TRUE),
 design = get_example_design(two_armed = TRUE),
 mu = .3,
 sigma = 1
)
evaluate_estimator(
 score = TestAgreement(),
 estimator = LikelihoodRatioOrderingPValue(),
 data_distribution = Normal(two_armed = TRUE),
 design = get_example_design(two_armed = TRUE),
 mu = .3,
 sigma = 1
)
evaluate_estimator(
 score = TestAgreement(),
 estimator = ScoreTestOrderingPValue(),
 data_distribution = Normal(two_armed = TRUE),
 design = get_example_design(two_armed = TRUE),
 mu = .3,
 sigma = 1
)
evaluate_estimator(
 score = TestAgreement(),
 estimator = StagewiseCombinationFunctionOrderingPValue(),
 data_distribution = Normal(two_armed = TRUE),
 design = get_example_design(two_armed = TRUE),
 mu = .3,
 sigma = 1
)
```

Again, we see that only the p-value derived from the stage-wise combination function
ordering always agrees with the primary testing decision.


# Conducting larger scale investigations in the performance characteristics of end-of-trial statistics
So far we have only looked at evaluating performance characteristics for a particular
set of assumptions, a choice of performance characteristics and a choice of a few estimators.
However, when designing a trial, one might want to produce results for a variety of different
scenarios, which can be time consuming. The evaluation of performance characteristics in
`adestr` can run in parallel using the `future` framework.


```{r}
library(future.apply)
# Change to e.g. plan(multisession) for parallel computing
plan(sequential)

# Scenario 1:
scores1 <- list(MSE(), OverestimationProbability())
estimators1 <- list(SampleMean(), BiasReduced())
dist1 <- list(Normal(two_armed = TRUE))
designs1 <- list(get_example_design(two_armed = TRUE))
mu1 <- seq(-1,1,.5)
sigma1 <- 1

# Scenario 2:
scores2 <- list(Coverage(), Width())
estimators2 <- list(NaiveCI(), StagewiseCombinationFunctionOrderingCI())
dist2 <- list(Normal(two_armed = TRUE))
designs2 <- list(get_example_design(two_armed = TRUE))
mu2 <- seq(-1,1,.5)
sigma2 <- 1

# Evaluate in parallel
res <- evaluate_scenarios_parallel(
  score_lists = list(scores1, scores2),
  estimator_lists =  list(estimators1, estimators2),
  data_distribution_lists = list(dist1, dist2),
  design_lists =  list(designs1, designs2),
  mu_lists = list(mu1, mu2),
  sigma_lists = list(sigma1, sigma2)
)

res[[1]]
res[[2]]
```

You will get back a list of data.frames containing the results for each scenario.
Note that within one scenario, the evaluation will take place for the cross-product
of all arguments that are supplied! This means that within one scenario, every estimator
is evaluated with every score at every point of mu for every sigma and every design.
Depending on your settings, this can get very time-consuming very fast, making parallelization
essential.

# Analyzing datasets
Next, let us look at how to the package can be used to calculate estimates
after data has been collected.

The first stage data of a trial might look like this:

```{r}
set.seed(321)
dat <- data.frame(
 endpoint = c(rnorm(56, .3, 1), rnorm(56, 0, 1)),
 group = factor(rep(c("trt", "ctl"),
                    c(56,56)), levels = c("trt", "ctl")),
 stage = rep(1, 112)
)
head(dat)
```

```{r}
analyze(data = dat,
        statistics = list(),
        data_distribution = Normal(two_armed = TRUE),
        design = get_example_design(two_armed = TRUE),
        sigma = 1)
```
The results suggest recruiting 23 more patients per group for the second stage.

We will simulate 47 more patients per group:

```{r}
dat <- rbind(dat,
             data.frame(
               endpoint = c(rnorm(47, .3, 1), rnorm(47, 0, 1)),
               group = factor(rep(c("trt", "ctl"),
                                  c(47, 47)), levels = c("trt", "ctl")),
               stage = rep(2, 94)
             ))
```

Now, we can use the `analyze` function to evaluate a selection of estimators on the complete dataset:

```{r}
analyze(
 data = dat,
 statistics = c(
   SampleMean(),
   BiasReduced(),
   PseudoRaoBlackwell(),
   MedianUnbiasedStagewiseCombinationFunctionOrdering(),
   StagewiseCombinationFunctionOrderingCI(),
   StagewiseCombinationFunctionOrderingPValue()
   ),
 data_distribution = Normal(two_armed = TRUE),
 sigma = 1,
 design = get_example_design(two_armed = TRUE)
)
```

The estimates presented here are for the difference in means of the two normal distributions.
Keep in mind that a difference of $\mu=0.3$ was used in the simulation.

Note that while the median unbiased estimator performs well in this particular example, this
is not universally true.