PhD Thesis Defense, Sumaira Saeed

Title: A Hybrid Framework for Semantic Textual  Similarity and Explanation Generation

Examiners:
Dr. Seema Latif, HOD and Associate
Professor, AI & DS-DOC , School of Electrical Engineering and Computer Science (SEECS), National University of Sciences and Technology (NUST)
Dr. Kamran Malik, Professor, Department of Data Science, Punjab University College of Information Technology (PUCIT), University of the Punjab

Date: 13th April 2026 at 2:00 PM
Venue: CS Conference Room, Tabba Building

Abstract
Semantic Textual Similarity (STS) is a foundational task in Natural Language Processing, with wide-ranging applications that include information retrieval, semantic search, question answering, plagiarism detection, and clustering related content to enhance user interaction and knowledge discovery. However, most existing systems operate as black-box models, providing similarity scores without explaining how the decisions were made. This limitation is particularly challenging in domains where transparency and justification are essential for trust, such as healthcare, legal studies, and the interpretation of religious texts. This dissertation presents SUMEX (Semantic textUal siMilarity and EXplanation generation), a hybrid framework that computes semantic similarity and produces natural language explanations within a single unified design. SUMEX combines embedding-based representations with structured ontological knowledge to preserve important taxonomical, hierarchical, and associative relationships that are often lost in purely statistical approaches. The framework presents a structured approach for generating document[1] level explanations by analyzing conceptual mappings, contextual associations, and domain[1] specific semantics. To evaluate explanation quality, a human assessment protocol is developed using completeness and correctness criteria, supported by inter-annotator agreement. The framework is applied and tested in two different domains: clinical notes in the healthcare domain and English translations of the Holy Quran. Results show that the hybrid ontology and embedding approach improves precision and reduces false positives when compared to embedding-only models. Expert evaluation further confirms that the explanations capture both surface-level and deeper conceptual alignments. A preliminary comparison with large language models reveals that SUMEX generates more consistent and domain-reliable outputs, while also avoiding common issues such as hallucination. Overall, this dissertation presents an interpretable approach to semantic similarity that strikes a balance between accuracy and transparency. The SUMEX framework offers a foundation for future research on explainable semantic similarity, hybrid knowledge-driven NLP systems, and domain-adaptable semantic reasoning.