Attached files

file filename
8-K - 8-K - EMMAUS LIFE SCIENCES, INC.a16-5415_18k.htm
EX-99.1 - EX-99.1 - EMMAUS LIFE SCIENCES, INC.a16-5415_1ex99d1.htm

Exhibit 99.2

 

Assessing the benefit from L-glutamine for treating sickle cell disease with respect to the frequency of sickle cell crises via the data from the Phase III clinical trial GLUSCC09-01

 

 

 

Reported by L. J. Wei, Ph.D.

 

 

 

Qualifications

 

I am a professor of biostatistics at Harvard University since 1991 and was a professor of biostatistical science and computational biology at Dana-Farber Cancer Institute, Harvard Medical School, between 1997 and 2012. I was the scientific director for the Program of Quantitative Sciences for Pharmaceutical Medicine at Harvard School of Public Health between 2003 and 2007. I was the co-director of the bioinformatics core at Harvard School of Public Health from 2003 to 2007. From 2003 to 2004, I served as the acting chair of the department of biostatistics at Harvard University. I was a professor of biostatistics and statistics at University of Wisconsin, University of Michigan, and George Washington University from 1982 to 1991. My scholarly writings include over 160 research articles in peer-reviewed academic journals. I was responsible for developing numerous novel statistical methods for designing, monitoring and analyzing clinical studies, survival analysis and meta-analysis. Many of these methods have been included in the most commonly used statistical packages such as SAS, S- plus, and R. I am an elected Fellow of the American Statistical Association and Institute of Mathematical Statistics. I was named “Statistician of the Year” in 2007 by the Boston Chapter of the American Statistical Association. I was selected as the Wilks Medal Recipient in 2009 by the American Statistical Association (one of the most prestigious awards in statistical science) for outstanding contributions to the clinical trial methodology research.

 

Background

 

The GLUSCC09-01study was a multicenter, randomized, double-blind, parallel group, placebo- controlled, Phase 3 study to compare the efficacy and safety of L-glutamine versus placebo in patients with sickle cell anemia or sickle ß0-thalassemia who are at least 5 years old. A total of 230 patients were randomized in a 2:1 ratio to L-glutamine (n = 152) or placebo (n = 78). The primary endpoint was the patient’s number of sickle cell crises by and at week 48. In the statistical analysis plan (SAP) (dated January 21, 2014) for analyzing this primary endpoint, it states:

 



 

The treatment groups will be compared with respect to number of painful sickle cell crises using a Cochran-Mantel-Haenszel (CMH) test (row mean scores) with ranks as scores and controlling for investigational site and hydroxyurea use.”

 

There is a broad class of statistical inference procedures under the extended CMH hypothesis- testing paradigm when the outcome is ordinal (i.e. >2 levels). With the SAS statistical software, the default procedure for implementing such an extended CMH-type test is to use the original numbers of crises as scores. In the SAP of the study, it calls for a “rank”-type scoring procedure for testing whether there is a difference between two arms. For an unstratified analysis, this approach is quite straightforward. However, in the presence of stratification, SAS provides several different rank-based options that may be used to construct the final test statistic. For some unclear reason, the CRO data analyst selected a rank procedure in SAS (using option: Scores = Rank), which utilized the stratum-specific rank scores without adjustment for the stratum sample size. The resulting p-value was 0.063, which is slightly larger than the pre- specified Type I error rate of 0.045 with the pre-specified two stratification factors.

 

At the meetings between and FDA and Emmaus after the data were unblinded (June 11 and October 15, 2014), FDA raised the following issues and concerns regarding the results from the primary endpoint analysis:

 

FDA: “Based on the pre-specified primary efficacy analysis (controlling for region and hydroxyurea use), the result for the primary endpoint of painful sickle cell crisis through Week 48 did not reach the pre-specified significance level of 0.045.”

 

FDA: “The primary efficacy results were inconsistent among geographic regions, as shown by the large difference in results observed based on the stratified analyses adjusted for region and hydroxyurea use (p=0.063) versus results by the analysis adjusted for hydroxyurea use only (p=0.008) from study GLUSCC09-01.”

 

 

I was asked by Emmaus to review the study’s SAP, the preliminary data analysis with respect to the primary endpoint and all the meeting correspondences with FDA to examine whether the study design and analysis were aligned coherently. Moreover, in order to facilitate an independent assessment of the treatment effect from L-glutamine, I was provided the patient- level data for independent data analysis.

 



 

Findings

I first reviewed the design of the trial thoroughly, as there was a question raised by the FDA in an Advice/Information Request (January 6, 2010) before the trial was begun:

 

FDA: “Why do you propose to use Wilcoxon rank-sum test for the sample size calculation and then use Cochran-Mantel-Haenszel test for the data analysis . . . ? The method used for the sample size calculation and the data analysis should be consistent.”

 

This was indeed an important question to ensure that the method for designing the study should be consistent with the method of data analysis at the end. The study was designed via the Wilcoxon test statistic and the analysis would be carried out via an extended CMH test using the stratified Wilcoxon test or equivalently the modified ridit method. This plan was clearly indicated in the subsequent response from the sponsor to the FDA (Dated January 28, 2010) before the study was begun:

 

Emmaus: “… The method for calculation of the sample size estimate (Wilcoxon rank-sum test that P(X<Y) = 0.5 for ordered categories) was intended to mimic, as much as possible, the CMH test to be used in the primary analysis. The stratified Wilcoxon rank-sum test may be carried out using the CMH test with modified ridit scores (in SAS); therefore, the corresponding power calculation available in nQuery Advisor 7.0 for the Wilcoxon rank sum test was used”

 

Note that the stratified Wilcoxon test, or equivalently the CMH test using modified ridit scores, is in fact a preferred procedure for analyzing the ordinal categorical data with stratification under the extended CMH setting, which is well cited in the literature:

 

For a single stratum, rank, ridit, and modified ridit scores produce the same result, which is the categorical counterpart of the Wilcoxon rank sum test. For stratified analyses, modified ridit scores produce van Elterens extension of the Wilcoxon rank sum test, a property that makes them the preferred of these three types of scores. 1

 

The SAS software indeed has an option for implementing the CMH with each of these three rank-based options. It appears that the sponsor was consistent in their intent to design and analyze the study with the same statistical procedure.

 

Please note also that if we analyze the data with these two proposed rank-based tests (i.e., the CMH test using modified ridit or the van Elteren’s stratified Wilcoxon rank sum test), the two- sided p-value obtained when stratifying for both region and hydroxyurea use with the pre- specified imputation algorithm, is 0.0053. If we instead stratify only for hydroxyurea use, the

 


1         Stokes, M.E., Davis, C.S. and Koch, G.G., 2012. Categorical data analysis using SAS. SAS institute.

 



 

resulting p-value is 0.0043. The results from these two analyses are consistently well below the pre-specified significance level 0.045 and do not exhibit any substantive change in interpretation depending on whether or not region is included as a stratification factor2.

 

 

 

It has been observed that:

 

“[i]f the [true] treatment effect is constant across strata, then the van Elteren’s test is asymptotically optimal.”3

 

For this phase III study, I performed a standard statistical interaction test for the treatment and region. The resulting p-value is 0.21, indicating that there are no obvious treatment differences across regions.

 

In order to quantify the treatment effect of L-glutamine relative to placebo, a natural parameter that arise from the Wilcoxon rank-sum test is Pr(X<Y), where:

 

X = # of crises experienced by a randomly sampled L-glutamine patient and

Y= # of crises experienced by a randomly sampled placebo patient.

 

Using both stratification factors, I estimated this probability and standard error within each of the 10 strata, and then combined the estimates using weights nk+1 (where nk is the total number of patients in stratum k, i.e. the same stratum weights used in the stratified Wilcoxon test and the scaling factor used by the modified ridit to standardize ranks within strata). This approach produced an overall estimate size of the group difference of 0.613 with a 95% confidence interval of (0.533, 0.695).

 

To explore the robustness of these findings, I also used all of the scoring options provided by SAS for implementing the extended CMH procedure (“table” i.e., no transformation; “ridit”; “modified ridit”; and the option chosen by the CRO statistical analyst: “rank”). The resulting p- values were 0.0314, 0.0050, 0.0053, and 0.0635, respectively. Another scoring method known as “logrank scores” is described in (1) and is readily available in the STATA software package. Implementing this approach yielded a p-value of 0.0135. Thus, of the five available scoring methods, the only one that did not yield a statistically significant result, namely a p-value of 0.063, is the one inadvertently chosen by the CRO data analyst and reported in the communication with FDA after the trial was completed.

 

 


2 Similarly, when stratified by region only the p-value is 0.0074, and without any stratification, p=0.0045.

3 Mehrotra, D.V., Lu, X. and Li, X., 2010. Rank-based analyses of stratified experiments: alternatives to the van Elteren’s test. The American Statistician, 64(2), pp.121-130.

 



 

Conclusions:

 

When dealing with the analysis of stratified ordinal categorical data, there are several rank-based test statistics proposed in the literature and implementable via SAS software under the broad class of extended CMH tests. The sponsor had a clear intention to use the stratified Wilcoxon test, or equivalently, the CMH test using modified ridit scores, as the test procedure both for designing and analyzing the study, as indicated in the official correspondences with FDA before the trial was started. These are non-parametric, rank-based tests and are recommended in SAS documentation and used extensively across the medical literature4,5,6. In the study SAP for the primary endpoint analysis, a rank-based test is mentioned, but no specific rank procedure was cited. It is possible that the data analyst for the study did not recall the previous intention to choose the same analytic method for designing and analyzing the study as was well outlined in the prior official correspondences with the FDA. When the appropriate, originally mentioned analytic methods (the stratified Wilcoxon, or equivalently, the CMH with modified ridit scores) are utilized for the primary endpoint analysis, the results are highly statistically significant (p- value of 0.005) and consistent with various choices of the stratification factors. Moreover, the Wilcoxon rank-sum test produced an estimated magnitude of treatment effect which did not differ by region (p-value for heterogeneity = 0.21). Based on the totality of evidence from my findings, L-glutamine has been demonstrated in reducing the frequency of sickle cell crises qualitatively and quantitatively.

 

 

February 25, 2016

 

 

L. J. Wei, Ph.D.

 

 


4 Johanson, J.F. and Ueno, R., 2007. Lubiprostone, a locally acting chloride channel activator, in adult patients with chronic constipation: a double-blind, placebo-controlled, dose-ranging study to evaluate efficacy and safety. Alimentary pharmacology & therapeutics, 25(11), pp.1351-1361.

 

5 Meltzer, E.O., Berkowitz, R.B. and Grossbard, E.B., 2005. An intranasal Syk-kinase inhibitor (R112) improves the symptoms of seasonal allergic rhinitis in a park environment. Journal of allergy and clinical immunology, 115(4), pp.791-796.

 

6 Fernandez, H.H., Greeley, D.R., Zweig, R.M., Wojcieszek, J., Mori, A., Sussman, N.M. and 6002-US-051 Study Group, 2010. Istradefylline as monotherapy for Parkinson disease: results of the 6002-US-051 trial. Parkinsonism & related disorders, 16(1), pp.16-20.