Publications | Sudipto Ghosh

2026

PREPRINT
Disentangling Causal Importance from Emergent Structure in Multi-Expert Orchestration

Sudipto Ghosh, Sujoy Nath, Sunny Manchanda, and Tanmoy Chakraborty

arXiv preprint arXiv:2602.04291, 2026

Abs arXiv Bib

Multi-expert systems, where multiple Large Language Models (LLMs) collaborate to solve complex tasks, are increasingly adopted for high-performance reasoning and generation. However, the orchestration policies governing expert interaction and sequencing remain largely opaque. We introduce INFORM, an interpretability analysis that treats orchestration as an explicit, analyzable computation, enabling the decoupling of expert interaction structure, execution order, and causal attribution. We use INFORM to evaluate an orchestrator on GSM8K, HumanEval, and MMLU using a homogeneous consortium of ten instruction-tuned experts drawn from LLaMA-3.1 8B, Qwen-3 8B, and DeepSeek-R1 8B, with controlled decoding-temperature variation, and a secondary heterogeneous consortium spanning 1B-7B parameter models. Across tasks, routing dominance is a poor proxy for functional necessity. We reveal a divergence between relational importance, captured by routing mass and interaction topology, and intrinsic importance, measured via gradient-based causal attribution: frequently selected experts often act as interaction hubs with limited causal influence, while sparsely routed experts can be structurally critical. Orchestration behaviors emerge asynchronously, with expert centralization preceding stable routing confidence and expert ordering remaining non-deterministic. Targeted ablations show that masking intrinsically important experts induces disproportionate collapse in interaction structure compared to masking frequent peers, confirming that INFORM exposes causal and structural dependencies beyond accuracy metrics alone.
@article{ghosh2026disentanglingcausalimportanceemergent, title = {Disentangling Causal Importance from Emergent Structure in Multi-Expert Orchestration}, author = {Ghosh, Sudipto and Nath, Sujoy and Manchanda, Sunny and Chakraborty, Tanmoy}, journal = {arXiv preprint arXiv:2602.04291}, year = {2026}, url = {https://arxiv.org/abs/2602.04291} }

2024

WORKSHOP
InLegalLLaMA: Indian Legal Knowledge Enhanced Large Language Model

Sudipto Ghosh, Devanshu Verma, Balaji Ganesan, Purnima Bindal, Vikas Kumar, and Vasudha Bhatnagar

In Proceedings of the LKM Workshop at IJCAI, 2024

Abs Bib HTML PDF

Large Language Models (LLM) are being increasingly used in many domains including legal and justice. General purpose models trained on web data are not performant enough on legal text analytics (LTA) tasks while fine tuning task specific models is expensive because of the annotation and compute costs. Pre-training domain or application specific models is increasingly popular. However pre-training LLMs in small domain corpora like Indian legal documents and judgements is challenging. We introduce our InLegalLLaMA model, along with the related training corpus, adapted for the Indian legal domain, that shows promise of improved performance on LTA tasks.
@inproceedings{ghosh2024inlegalllama, title = {InLegalLLaMA: Indian Legal Knowledge Enhanced Large Language Model}, author = {Ghosh, Sudipto and Verma, Devanshu and Ganesan, Balaji and Bindal, Purnima and Kumar, Vikas and Bhatnagar, Vasudha}, booktitle = {Proceedings of the LKM Workshop at IJCAI}, year = {2024}, issn = {16130073}, volume = {3818}, url = {https://ceur-ws.org/Vol-3818/paper3.pdf}, }
PREPRINT
Human Centered AI for Indian Legal Text Analytics

Sudipto Ghosh, Devanshu Verma, Balaji Ganesan, Purnima Bindal, Vikas Kumar, and Vasudha Bhatnagar

arXiv preprint arXiv:2403.10944, 2024

Abs arXiv Bib

Legal research is a crucial task in the practice of law. It requires intense human effort and intellectual prudence to research a legal case and prepare arguments. Recent boom in generative AI has not translated to proportionate rise in impactful legal applications, because of low trustworthiness and and the scarcity of specialized datasets for training Large Language Models (LLMs). This position paper explores the potential of LLMs within Legal Text Analytics (LTA), highlighting specific areas where the integration of human expertise can significantly enhance their performance to match that of experts. We introduce a novel dataset and describe a human centered, compound AI system that principally incorporates human inputs for performing LTA tasks with LLMs.
@article{ghosh2024human, title = {Human Centered AI for Indian Legal Text Analytics}, author = {Ghosh, Sudipto and Verma, Devanshu and Ganesan, Balaji and Bindal, Purnima and Kumar, Vikas and Bhatnagar, Vasudha}, journal = {arXiv preprint arXiv:2403.10944}, year = {2024}, url = {https://arxiv.org/abs/2403.10944} }

THESIS

Indian Legal Knowledge Enhanced LLMs for LTA Tasks

Sudipto Ghosh

Department of Computer Sciece, University of Delhi, 2024

Bib PDF

@mastersthesis{ghosh2022thesis,
  title = {Indian Legal Knowledge Enhanced LLMs for LTA Tasks},
  author = {Ghosh, Sudipto},
  school = {Department of Computer Sciece, University of Delhi},
  year = {2024},
}

2022

WORKSHOP
Constructing a Knowledge Graph from Indian Legal Domain Corpus

Sarika Jain, Pooja Harde, Nandana Mihindukulasooriya, Sudipto Ghosh, Abhinav Dubey, and Ankush Bisht

In Proceedings of the TEXT2KG Workshop at ESWC, 2022

Abs Bib HTML PDF

While being an important pillar of human society, legal domain consists of large corpora of complex documents about different aspects such as laws or court judgements. In recent years, knowledge graphs have become a prominent solution to represent such complex information in semantically rich machine readable manner allowing access to other AI powered downstream applications. In this work, we aim to construct a reliable knowledge graph from Legal domain corpus that may be utilized by researchers and the application developers working in legal domain.The source dataset chosen is the Indian Legal Court Judgements and NyOn1 (Nyaya Ontology) has been utilized for conceptualization. A framework that consists of entity extraction, relation extraction, triple construction is used to convert the legal text into RDF triples. The knowledge graph thus built has been quantitatively evaluated over a small random sample with reasonable results.
@inproceedings{Jain2022, author = {Jain, Sarika and Harde, Pooja and Mihindukulasooriya, Nandana and Ghosh, Sudipto and Dubey, Abhinav and Bisht, Ankush}, issn = {16130073}, booktitle = {Proceedings of the TEXT2KG Workshop at ESWC}, title = {Constructing a Knowledge Graph from Indian Legal Domain Corpus}, volume = {3184}, year = {2022}, url = {https://ceur-ws.org/Vol-3184/TEXT2KG_Paper_6.pdf}, }