




SubmissionAI
AI-powered data provenance and compliance automation for Phase 2→3 drug development transitions
Links
Team
1 member- ZHOwner
Zhenbang(Bruce) Wang
Overview
SubmissionAI is a five-agent AI system that automates regulatory compliance for pharmaceutical IND submissions, targeting the Phase 2→3 transition where 70% of programs currently fail at a cost of $100–300M per failure.
Built by a Sanofi biostatistician and a senior software engineer, the system deploys five specialized LangGraph agents: a Content Analyzer that parses eCTD documents, a Data Provenance Tracker that maps Phase 2 findings to Phase 3 design, a RAG-powered SAP Validator that checks pre-specification and estimand compliance against ICH E9/E9(R1), a RAG-powered FDA Conformance Checker grounded in 25 regulatory guidelines (ICH, FDA, 21 CFR, EMA), and a Report Generator that synthesizes all findings into an actionable compliance report with FDA approval probability.
The RAG system is the core technical innovation: two dedicated knowledge bases (25 FDA/ICH/EMA guidelines for conformance; 12 statistical methodology guidelines for SAP validation) power a 5-stage hybrid retrieval pipeline — dense embedding + BM25 sparse retrieval + Reciprocal Rank Fusion + cross-encoder re-ranking — so every finding cites the exact guideline section rather than relying on LLM memory. The SAP Validator adds cross-document consistency retrieval, simultaneously querying the governing guideline, the SAP, and the CSR to detect deviations that no single-document check would catch.
Demonstrated on a synthetic Phase 3 atopic dermatitis trial (XYZ-101) with five planted regulatory deviations, SubmissionAI achieves a 94% overall compliance score and 85–90% estimated FDA approval probability in under 60 seconds.