analyzeData(...)

analyzeData(...) performs analysis of the individual replicate datasets. The Usage page gives an overview of the analysis process, but in short the analyzeData(...) function analyses each replicate dataset in turn and applies the analysisCode function supplied to analyse the data. The analysisCode function need only be written in the context of analysing a single dataset. MSToolkit takes care of reading in each replicate dataset and writing back out the analysis results. MSToolkit was written with the main purpose of evaluating operating characteristics of "learning" phase trials, where dose selection was the main aim we focus on returning inferences about dose effects and the output from the analysis must contain DOSE and some basic summary statistics below. In future versions of MSToolkit we will relax this criteria.

The user must provide a valid R function which is to be used for analysing the generated dataset or an external file (.R or .SAS) which contains code for analysis of the data. The user must also provide functions for performing the micro- and macro-evaluation summary of trial performance.

The analyzeData(...) function automatically handles the data input / output, pointing the analytic function (analysisCode) to each replicate dataset in turn. The user doesn’t need to explicitly name the “replicate000x.csv” file for analysis. analyzeData(...) takes the user-defined analysisCode function and loops through each replicate dataset in turn passing this to the analysisCode function which takes it as the argument “data”. The user then works with an data frame object called “data” in the analysisCode function. Examples are given below.

The analysisCode MUST return 5 items: estimated mean (labelled MEAN), std. error (labelled SE) with lower (LOWER) and upper (UPPER) interval estimates and N, the number of subjects on each DOSE. These are required output for micro-evaluation. Other output can be carried along (e.g. Mean difference from placebo, Emax model parameter estimates), but these key measures are expected. The estimates should use the appropriate method for the used analytical technique. For example they could be LSMeans from a linear model or estimates based on the fitted dose-response model. Micro-evaluation results are used in cases where we may wish to drop doses at interim analysis - the decision to drop doses can then be based on the output interval estimates. For example we may wish to drop doses where the lower limit is less than zero (in a difference from baseline or comparison to placebo). Micro-evaluation is performed at each specified interim analysis. If several interim analyses are planned, then the Micro-evaluation is performed on the whole dataset (without dropping any doses), and after every interim. This allows comparisons in trial performance between the adapting and not adapting.

The macroCode summarises the trial performance as a whole - it should provide a single assessment of the success or failure of a trial at the conclusion of the trial. For example, we may wish to summarise the proportion of simulated trials showing the maximal effect greater than a clinically meaningful effect. Similarly we may wish to show that the final estimates of model parameters are precise and unbiased. Macro-evaluation should summmarise trial performance at this level.

The analyzeData(...) function is different from the generateData(...) function in that there are fewer low level functions that the user will typically want to access. The majority of the lower level functions for analyzeData(...) govern the data input and output of the trial replicate data, and general "housekeeping" and submitting of the analysis jobs to the GRID.


Arguments

replicates - Which replicates to use in the analyzeData(...) step. DEFAULT is ALL replicates, but a vector of replicate numbers can be given to specify a subset for analysis.

analysisCode - R function or SAS file of analytic code. MUST return mean, std.error, lower and upper interval estimates for each dose. Other parameters may be returned, but the core set as described must be in the dataset for use by interimCode. (REQUIRED)

macroCode - Macro-evaluation code. Algorithm for defining trial level success. (REQUIRED)

interimCode - Defines an algorithm for dropping doses at interim analyses.

software - Software for analysis - could be R or SAS.

grid - If running the MSToolkit from a UNIX node or via ePharm, then the user can choose to split the analysis across GRID nodes in order to speed up the analysis. If running MSToolkit locally on a laptop, this option cannot be accessed at this time.

removeMissing, removeRespOmit - should missing data or subjects who have dropped out be included in analysis?

If running analyzeData(...) on an LSF GRID then the analysis job will split the job into roughly equal sized numbers of replicates to run across GRID nodes. Running MSToolkit on an LSF GRID will require the rlsf package.

NOTE: When running less than about 300 replicates, it may be quicker to run the MSToolkit simulations locally on a laptop / PC rather than on the GRID due to the GRID queueing system and overheads in data input and output.