Large Language Models (LLM) are being increasingly used in many domains including legal and justice. General purpose models trained on web data are not performant enough on legal text analytics (LTA) tasks while fine tuning task specific models is expensive because of the annotation and compute costs. Pre-training domain or application specific models is increasingly popular. However pre-training LLMs in small domain corpora like Indian legal documents and judgements is challenging. We introduce our InLegalLLaMA model, along with the related training corpus, adapted for the Indian legal domain, that shows promise of improved performance on LTA tasks.
Legal research is a crucial task in the practice of law. It requires intense human effort and intellectual prudence to research a legal case and prepare arguments. Recent boom in generative AI has not translated to proportionate rise in impactful legal applications, because of low trustworthiness and and the scarcity of specialized datasets for training Large Language Models (LLMs). This position paper explores the potential of LLMs within Legal Text Analytics (LTA), highlighting specific areas where the integration of human expertise can significantly enhance their performance to match that of experts. We introduce a novel dataset and describe a human centered, compound AI system that principally incorporates human inputs for performing LTA tasks with LLMs.
THESIS
Indian Legal Knowledge Enhanced LLMs for LTA Tasks
Sudipto Ghosh
Department of Computer Sciece, University of Delhi, 2024
While being an important pillar of human society, legal domain consists of large corpora of complex documents about different aspects such as laws or court judgements. In recent years, knowledge graphs have become a prominent solution to represent such complex information in semantically rich machine readable manner allowing access to other AI powered downstream applications. In this work, we aim to construct a reliable knowledge graph from Legal domain corpus that may be utilized by researchers and the application developers working in legal domain.The source dataset chosen is the Indian Legal Court Judgements and NyOn1 (Nyaya Ontology) has been utilized for conceptualization. A framework that consists of entity extraction, relation extraction, triple construction is used to convert the legal text into RDF triples. The knowledge graph thus built has been quantitatively evaluated over a small random sample with reasonable results.