Conference Abstracts - Summit on Cancer Health Disparities (SCHD26)
Vol. 6, Issue Supplement 1, 2026 · S1-3
Leveraging Large Language Models for Intelligent Document Processing in Medical Journal Publishing
Pratik Hadkhale, BS,Bipul Luitel, Phd.,Suyog Kadel, Msc.
Submission received: 2025-09-18 / Accepted: 2026-01-07 / Published: 2026-01-25
Background
The academic publishing industry processes millions of research articles annually, with medical journals facing unique challenges due to complex formatting requirements, specialized terminology, and strict accuracy standards. Traditional document processing workflows in journal publishing involve multiple stages: manuscript submission, peer review, copyediting, typesetting, and final formatting for web publication. Each stage historically required significant human intervention, particularly the conversion from author-submitted formats (typically Microsoft Word or PDF) to structured, web-ready formats (HTML/XML).
Methods
Traditional approaches to converting submitted manuscripts from PDF and Word formats to web-ready HTML/XML involve significant manual effort, are error-prone, and incur substantial costs. Our system, deployed at Binaytara's journal platform, leverages Claude (Anthropic) to intelligently parse, structure, and format medical research articles while preserving semantic integrity and handling complex elements such as references, equations, and medical terminology.
Manuscripts submitted in Microsoft Word format through our journal portal undergo an automated processing pipeline. First, submissions are converted to PDF via the Convert API to facilitate inline annotations during peer review. After acceptance through peer review, the original Word file is processed using Markitdown to extract the text content. This content is then processed by a large language model, which reconstructs the article in structured JSON format, which consists of defined sections (e.g. Abstract, Introduction, Methods, Results, Discussion, References), formats inline references, equations, and headings, and resolves edge cases that challenge heuristic or rule-based approaches, particularly in reference formatting. The structured JSON format is then used to create formatted HTML, PDF and JATS XML versions of the article.
Results
The new platform achieved a 95% reduction in processing costs compared to traditional third-party services and lowered total management and processing costs by over 60%. Efficiency gains included immediate post-review formatting, elimination of external vendor dependencies, and reduced copyeditor workload, enabling greater focus on content quality. Key benefits comprised a cost-effective publishing workflow, faster publication turnaround, improved consistency and control over article structure, and enhanced editorial capacity.
Cost comparison between Binaytara Journal Portal and Third-party Platform
| Binaytara Journal Portal | Third-party Platform | |
|---|---|---|
Processing cost (per article)
| $1 | $30 |
| Total articles published (per year) | 100 | 100 |
Journal management cost
| $500 | $1,000 |
| Five year management cost | $30,000 | $60,000 |
| Five year processing cost | $500 | $15,000 |
| Total Cost | $30,500 | $75,000 |
| Cost Savings over 5 years | $44,500 (~60%) |
Conclusion
The integration of LLMs into our journal platform has proven to be a transformational upgrade to our publication workflow. By automating typesetting and reference formatting, we have drastically cut costs, improved efficiency, and reduced editorial overhead. This demonstrates the practical value of AI in scholarly publishing and opens avenues for further innovations in digital content processing.
