When the STEP 1 trial (NCT03548935) reported a 14.9% mean body weight reduction with once-weekly subcutaneous semaglutide 2.4 mg at 68 weeks, that figure became a benchmark for GLP-1 obesity pharmacotherapy. But for the researchers who designed the protocol, locked the statistical analysis plan, and navigated FDA review, 14.9% was the output of a carefully constructed endpoint architecture — a dual co-primary framework, tiered responder analyses, cardiometabolic secondary hierarchies, and estimand strategies that together define what regulators, payers, and clinicians accept as meaningful weight loss. Understanding Phase III GLP-1 trial endpoints is essential for interpreting results across the rapidly expanding GLP-1 and dual/triple agonist landscape, where endpoint choice determines which claims a drug can make on its label and how it positions against competitors.
The Dual Co-Primary Endpoint Framework Required by FDA Guidance
The FDA's 2007 guidance document on developing products for weight management established the dual co-primary endpoint structure that has governed every pivotal Phase III GLP-1 obesity trial since. Sponsors must demonstrate two things simultaneously to achieve primary endpoint success:
- A statistically significant difference in mean percentage body weight change from baseline between active drug and placebo at the pre-specified assessment window
- A statistically significant difference in the proportion of participants achieving ≥5% body weight reduction
Both must independently reach statistical significance. A drug that achieves impressive mean weight reduction but fails the responder threshold — or demonstrates high responder rates with a modest mean effect — does not satisfy the dual criterion. This dual structure prevents scenarios where a small subset of extreme responders inflates the mean while the majority of patients see clinically minimal benefit.
The SCALE Obesity and Prediabetes trial (NCT01272219) for liraglutide 3.0 mg (Saxenda) provides a clean illustration: mean body weight reduction of 8.0% versus 2.6% with placebo (p<0.001), with 63.2% of liraglutide-treated participants achieving ≥5% weight reduction compared to 27.1% in placebo (p<0.001). Both co-primary endpoints cleared significance across a 3,731-participant enrollment — a design template that subsequent GLP-1 programs replicated with only minor protocol modifications.
Tiered Responder Thresholds — What ≥5%, ≥10%, and ≥15% Signify Clinically
Phase III GLP-1 trials do not stop at the ≥5% regulatory floor. Secondary responder analyses stratify outcomes across three tiers corresponding to progressively greater cardiometabolic benefit, each with distinct clinical implications for prescribing decisions and payer negotiations.
- ≥5% body weight reduction: The established floor for cardiometabolic risk factor modification — associated with improvements in blood pressure, fasting plasma glucose, and triglycerides based on randomized lifestyle intervention data including Look AHEAD (NCT00017953). The FDA adopted this threshold as the minimum meaningful benefit standard for chronic weight management indications.
- ≥10% body weight reduction: Associated with greater improvements in HbA1c, HDL cholesterol, and physical function scores. At this level, the magnitude of metabolic benefit begins to approximate effect sizes seen with intensive, clinically supervised lifestyle intervention programs.
- ≥15% body weight reduction: Associated with potential remission or substantial improvement of type 2 diabetes in subgroups, approximating outcomes historically achievable only with sleeve gastrectomy in selected populations — a comparison that has entered clinical and payer conversations about GLP-1 therapy positioning relative to bariatric surgery.
In SURMOUNT-1 (NCT04184622), tirzepatide 15 mg — a dual GIP/GLP-1 receptor agonist — achieved ≥5% weight reduction in 96% of participants, ≥10% in 86%, and ≥15% in 69% at 72 weeks (Jastreboff AM et al., NEJM 2022). Those responder rates exceeded all prior GLP-1 monotherapy data and drove FDA approval of tirzepatide (Zepbound) for obesity in November 2023. The tiered responder profile also provided the quantitative basis for payer value dossiers and comparative effectiveness arguments against semaglutide 2.4 mg — data that SURMOUNT-5 later confirmed in a powered active-controlled head-to-head design.
Phase III GLP-1 Trial Endpoints: Secondary Outcomes and Cardiometabolic Measurement
Weight reduction alone is insufficient for most modern regulatory and payer decisions. Phase III GLP-1 obesity trials carry extensive secondary endpoint hierarchies covering cardiometabolic, functional, and patient-reported domains. Which endpoints are powered versus exploratory determines what claims a drug can make beyond the weight reduction primary, and that distinction shapes post-approval labeling, formulary decisions, and clinical guideline uptake.
Standard cardiometabolic secondary endpoints across GLP-1 obesity programs include:
- HbA1c (critical in trials enrolling participants with type 2 diabetes, such as STEP 2 [NCT03552757], where HbA1c reduction from an 8.1% baseline was a powered secondary endpoint)
- Fasting plasma glucose and insulin resistance indices (HOMA-IR in select protocols)
- Systolic and diastolic blood pressure
- Fasting lipid panel: LDL-C, HDL-C, total cholesterol, and triglycerides
- Waist circumference as a surrogate for visceral adiposity
The SELECT trial (NCT03574597) elevated cardiovascular outcomes from secondary to primary status for the first time in a GLP-1 obesity trial conducted outside the diabetes indication. SELECT enrolled 17,604 adults with pre-existing atherosclerotic cardiovascular disease (ASCVD) and BMI ≥27 kg/m² — without type 2 diabetes — and used time to first MACE (cardiovascular death, non-fatal myocardial infarction, or non-fatal stroke) as its primary endpoint. At a median follow-up of 39.8 months, semaglutide 2.4 mg reduced MACE risk by 20% (HR 0.80; 95% CI 0.72–0.90; p<0.001) (Lincoff AM et al., NEJM 2023). That result supported an FDA label expansion adding cardiovascular risk reduction as an indication for Wegovy — a first for any GLP-1 obesity agent and a regulatory model that competitor programs have since targeted in their own Phase III cardiovascular outcome trial designs.
Patient-reported outcomes (PROs) now appear in pre-specified secondary hierarchies across major Phase III programs:
- Impact of Weight on Quality of Life–Lite (IWQOL-Lite): a validated instrument assessing physical function, self-esteem, sexual life, public distress, and work domains
- EQ-5D utility index: used in health economic modeling for cost-effectiveness analyses submitted to payers and health technology assessment bodies
- SF-36 physical functioning subscale: included in STEP 6 (NCT04132284) for East Asian populations with lower BMI enrollment thresholds
Statistical Design — Estimands, Missing Data, and Trial Integrity
The ICH E9(R1) addendum on estimands fundamentally reshaped how Phase III obesity trials handle intercurrent events — particularly discontinuation of study drug before the primary assessment window closes. Two estimand strategies now appear routinely in Phase III GLP-1 statistical analysis plans, and the choice between them materially affects how results are interpreted in regulatory submissions and clinical practice:
- Treatment policy estimand: Captures outcomes for all randomized participants regardless of discontinuation, using multiple imputation for missing weight data. This approach reflects real-world effectiveness — it asks what happened to the people enrolled, not what happened to those who stayed on drug. SURMOUNT-1 and STEP 1 both pre-specified this as the primary estimand.
- Hypothetical estimand: Estimates what would have occurred had all participants remained on treatment through the assessment window. This isolates the pharmacological signal from adherence variability and consistently shows a larger treatment effect than the treatment policy estimate — useful for understanding pharmacological ceiling but not appropriate as a primary regulatory analysis.
Attrition rates in Phase III GLP-1 trials are non-trivial and asymmetric. STEP 1 reported approximately 7.3% discontinuation in the semaglutide arm and 13.5% in placebo — differential dropout driven by GI adverse events in the active arm and weight gain frustration in placebo. Mixed-effects models for repeated measures (MMRM) are the standard analytic approach, with covariance structure and imputation model pre-specified in a locked statistical analysis plan before unblinding. The consequence of this rigor: the 14.9% mean weight reduction from STEP 1 is a treatment policy estimate that includes participants who discontinued early. The per-protocol (completers-only) estimate is higher, but regulatory agencies and clinical guidelines correctly anchor to the intention-to-treat framework for population-level inference.
Assessment Windows, Dose Titration Schedules, and Durability Evidence
Most Phase III GLP-1 obesity trials are powered for primary endpoint assessment at 52–72 weeks, reflecting FDA guidance that chronic weight management drugs must demonstrate durable benefit beyond initial treatment exposure. These windows also accommodate mandatory dose escalation schedules designed to minimize gastrointestinal tolerability burden during early treatment:
- Semaglutide 2.4 mg: escalation from 0.25 mg through 0.5, 1.0, 1.7, and 2.4 mg over 16 weeks
- Tirzepatide 15 mg: escalation from 2.5 mg through 5, 7.5, 10, 12.5, and 15 mg over 20 weeks
- Liraglutide 3.0 mg: escalation from 0.6 mg through 1.2, 1.8, 2.4, and 3.0 mg over 4 weeks (a comparatively compressed schedule)
The consequence is that primary endpoint data reflects the drug at therapeutic maintenance dose for roughly 48–52 weeks rather than the full nominal trial duration. This compression affects how weight loss trajectories should be interpreted: plateau analysis and rate-of-loss slopes observed in the early maintenance phase may not capture the full pharmacological ceiling.
Durability data has become essential context for interpreting Phase III results. STEP 4 (NCT03548987) randomized participants who had already achieved weight loss on semaglutide 2.4 mg to either continue or switch to placebo at week 20. Those who discontinued regained approximately two-thirds of their lost weight by week 68. The design effectively generated discontinuation evidence without extending the primary Phase III protocol to multi-year duration, and the findings reframed GLP-1 obesity therapies as requiring chronic use — analogous to antihypertensive pharmacotherapy — rather than finite time-limited interventions. That framing now appears in FDA labeling language and in major clinical practice guideline publications.
Subgroup Analyses — Population Signals That Inform Subsequent Trial Design
Phase III GLP-1 obesity trials are powered for overall treatment effects, not subgroup differences. Pre-specified subgroup analyses are hypothesis-generating by design and should not be interpreted as definitive evidence of differential response. Nevertheless, consistent signals across programs have materially shaped subsequent protocol development and indication-specific trial design.
Diabetes status is the most consistently documented moderator of GLP-1-mediated weight loss magnitude. STEP 2 (NCT03552757) enrolled adults with type 2 diabetes and BMI ≥27 kg/m² and observed 9.6% mean weight reduction with semaglutide 2.4 mg versus 3.4% placebo (p<0.001) — approximately two-thirds of the 14.9% observed in the non-diabetic STEP 1 population. Proposed mechanisms include differences in baseline insulin resistance, differences in GIP receptor sensitivity between insulin-resistant and insulin-sensitive phenotypes, and the contribution of exogenous insulin use in a subset of STEP 2 participants to baseline weight trajectory.
Ethnicity-specific enrollment thresholds drove the dedicated design of STEP 6 (NCT04132284), which enrolled East Asian adults at BMI cutoffs of ≥27 kg/m² with at least one comorbidity, or ≥32.5 kg/m² without comorbidity — reflecting WHO Asia-Pacific metabolic risk criteria rather than standard Western thresholds. Mean weight reduction with semaglutide 2.4 mg reached 13.2% at 68 weeks, broadly consistent with the non-diabetic STEP 1 result despite lower absolute baseline BMI, supporting the metabolic-risk-based approach to enrollment rather than BMI alone.
Indication-specific subgroup signals from the core Phase III programs have generated dedicated trials now in various stages of completion: SURMOUNT-OSA (NCT05567055) using Apnea-Hypopnea Index reduction as a co-primary endpoint alongside body weight in adults with obstructive sleep apnea; FLOW (NCT03819153) evaluating semaglutide in chronic kidney disease with renal composite outcomes as the primary endpoint; and multiple Phase III programs targeting metabolic dysfunction-associated steatohepatitis (MASH) with GLP-1 and dual/triple agonist compounds, using liver histology resolution as the primary regulatory endpoint.
Gaps in Current Phase III GLP-1 Endpoint Science
Despite mature data from the STEP, SCALE, SURMOUNT, and SELECT programs, several endpoint design limitations remain consequential for clinical decision-making and future trial architecture.
Lean mass preservation is not captured by standard Phase III primary endpoints. Available DEXA substudy data suggests approximately 25–40% of weight lost on GLP-1 agonists may be lean mass rather than fat mass — a clinically significant concern in older adults and individuals with baseline sarcopenia or low muscle reserve. No pivotal obesity trial has included lean mass preservation as a co-primary endpoint, and the DEXA subgroup data available from completed trials is underpowered for definitive conclusions. This gap is actively driving body composition endpoints in emerging Phase III designs targeting MASH, age-related metabolic disease, and post-surgical weight regain.
Head-to-head comparative endpoints remain the exception rather than the norm. Most Phase III trials compare active drug against placebo, generating efficacy data that is not directly comparable across programs due to differences in enrollment criteria, baseline BMI, diabetes prevalence, and trial duration. SURMOUNT-5 (NCT05822700), comparing tirzepatide directly to semaglutide 2.4 mg in adults with obesity or overweight, reported approximately 20.2% mean weight reduction for tirzepatide versus 13.7% for semaglutide (p for superiority <0.001) at 72 weeks — the first powered active-controlled Phase III trial to generate head-to-head weight endpoint data for these two agents. That active-controlled design provides a model for comparative effectiveness trials as the GLP-1/dual/triple agonist field continues to expand.
Predictive biomarker endpoints have not been validated for clinical use or trial stratification. GLP-1 receptor expression, circulating GIP and GLP-1 fasting and postprandial levels, gut microbiome composition, and appetite-regulating hormone profiles (PYY, ghrelin) have been explored in substudies of Phase III programs but have not qualified as pre-specified stratification biomarkers for enrollment criteria or responder prediction. Without validated biomarkers, trial design defaults to empirical broad inclusion criteria and post-hoc subgroup exploration — limiting the ability to match patients to therapies based on pre-treatment biology.
Pediatric endpoint standards remain in early development. STEP TEENS (NCT04102189) applied the same dual co-primary structure and ≥5% responder threshold as adult trials to adolescents aged 12–17 years with obesity, finding semaglutide 2.4 mg produced 16.1% mean BMI reduction versus 0.6% for placebo (p<0.001) at 68 weeks — a BMI-based primary endpoint rather than absolute weight, reflecting growth dynamics in the pediatric population. Whether cardiometabolic risk thresholds derived from adult data translate equivalently to adolescent populations requires dedicated long-term prospective data not yet available from randomized trials.
Clinicians and researchers reviewing Phase III GLP-1 data are best positioned to interpret results by examining the full endpoint hierarchy — primary, secondary, and pre-specified subgroup — alongside the estimand strategy, attrition pattern, and dose escalation schedule, rather than anchoring to a single percentage figure from an abstract. The architecture behind that number determines its clinical meaning and its applicability to the specific patient population in question.
This article summarizes research and does not constitute medical advice. Consult a licensed clinician for diagnosis, treatment, or any decisions about medications or supplements.