Publications | Sudipto Ghosh

2024

WORKSHOP
InLegalLLaMA: Indian Legal Knowledge Enhanced Large Language Model

Sudipto Ghosh, Devanshu Verma, Balaji Ganesan, Purnima Bindal, Vikas Kumar, and Vasudha Bhatnagar

In Proceedings of the LKM Workshop at IJCAI, 2024

Abs Bib HTML PDF

Large Language Models (LLM) are being increasingly used in many domains including legal and justice. General purpose models trained on web data are not performant enough on legal text analytics (LTA) tasks while fine tuning task specific models is expensive because of the annotation and compute costs. Pre-training domain or application specific models is increasingly popular. However pre-training LLMs in small domain corpora like Indian legal documents and judgements is challenging. We introduce our InLegalLLaMA model, along with the related training corpus, adapted for the Indian legal domain, that shows promise of improved performance on LTA tasks.
@inproceedings{ghosh2024inlegalllama, title = {InLegalLLaMA: Indian Legal Knowledge Enhanced Large Language Model}, author = {Ghosh, Sudipto and Verma, Devanshu and Ganesan, Balaji and Bindal, Purnima and Kumar, Vikas and Bhatnagar, Vasudha}, booktitle = {Proceedings of the LKM Workshop at IJCAI}, year = {2024}, }
PREPRINT
Human Centered AI for Indian Legal Text Analytics

Sudipto Ghosh, Devanshu Verma, Balaji Ganesan, Purnima Bindal, Vikas Kumar, and Vasudha Bhatnagar

arXiv preprint arXiv:2403.10944, 2024

Abs arXiv Bib

Legal research is a crucial task in the practice of law. It requires intense human effort and intellectual prudence to research a legal case and prepare arguments. Recent boom in generative AI has not translated to proportionate rise in impactful legal applications, because of low trustworthiness and and the scarcity of specialized datasets for training Large Language Models (LLMs). This position paper explores the potential of LLMs within Legal Text Analytics (LTA), highlighting specific areas where the integration of human expertise can significantly enhance their performance to match that of experts. We introduce a novel dataset and describe a human centered, compound AI system that principally incorporates human inputs for performing LTA tasks with LLMs.
@article{ghosh2024human, title = {Human Centered AI for Indian Legal Text Analytics}, author = {Ghosh, Sudipto and Verma, Devanshu and Ganesan, Balaji and Bindal, Purnima and Kumar, Vikas and Bhatnagar, Vasudha}, journal = {arXiv preprint arXiv:2403.10944}, year = {2024}, url = {https://arxiv.org/abs/2403.10944} }

THESIS

Indian Legal Knowledge Enhanced LLMs for LTA Tasks

Sudipto Ghosh

Department of Computer Sciece, University of Delhi, 2024

@mastersthesis{ghosh2022thesis,
  title = {Indian Legal Knowledge Enhanced LLMs for LTA Tasks},
  author = {Ghosh, Sudipto},
  school = {Department of Computer Sciece, University of Delhi},
  year = {2024},
}

2022

WORKSHOP
Constructing a Knowledge Graph from Indian Legal Domain Corpus

Sarika Jain, Pooja Harde, Nandana Mihindukulasooriya, Sudipto Ghosh, Abhinav Dubey, and Ankush Bisht

In Proceedings of the TEXT2KG Workshop at ESWC, 2022

Abs Bib HTML PDF

While being an important pillar of human society, legal domain consists of large corpora of complex documents about different aspects such as laws or court judgements. In recent years, knowledge graphs have become a prominent solution to represent such complex information in semantically rich machine readable manner allowing access to other AI powered downstream applications. In this work, we aim to construct a reliable knowledge graph from Legal domain corpus that may be utilized by researchers and the application developers working in legal domain.The source dataset chosen is the Indian Legal Court Judgements and NyOn1 (Nyaya Ontology) has been utilized for conceptualization. A framework that consists of entity extraction, relation extraction, triple construction is used to convert the legal text into RDF triples. The knowledge graph thus built has been quantitatively evaluated over a small random sample with reasonable results.
@inproceedings{Jain2022, author = {Jain, Sarika and Harde, Pooja and Mihindukulasooriya, Nandana and Ghosh, Sudipto and Dubey, Abhinav and Bisht, Ankush}, issn = {16130073}, booktitle = {Proceedings of the TEXT2KG Workshop at ESWC}, title = {Constructing a Knowledge Graph from Indian Legal Domain Corpus}, volume = {3184}, year = {2022}, url = {https://ceur-ws.org/Vol-3184/TEXT2KG_Paper_6.pdf}, }