Submitted by: Big Data Expo

Big Data Expo 2019 | Studio Data | Julio interviews Julien Rossi, University of Amsterdam

Julien Rossi, Researcher Legal Text Analytics University of Amsterdam, does research on legal texts and analytics. Among other things Rossi works on case law retrieval –  finding which article of the law is most relevant for your case. In this episode Julien and Julio talk about the importance and the challenges regarding these legal text analytics. 

Rossi asks himself why this is relevant: “At first glance, it looks like the same language as in literature, but at a closer look you see that the English law language is very different. For instance, there are very long sentences used with complicated structures. Also, words in a legal sentence sometimes have a different meaning than in normal English, or words that nobody uses in written English appear in legal texts. Lastly, legal texts are really dense. Most machines we have now can only deal with ‘normal’ English and deal with short and easy sentences and shorter documents.”

How AI can be used for text analysis in legal firms? According to Rossi, the answer is quite clear: “Legal firms have to deal with big amounts of written documents. The first thing that comes to mind, is reducing that amount by using AI. But there is a lot more that we can do. The observation we make is that there is a tech-push with AI, that provide solutions.” 

However, Rossi also observes that most of those solutions do not fix the firm’s pain points, which means that the status-quo for legal firms is preserved. “We, as scientists, want to improve that, but it also has to come from the business’ needs and not only from a scientific perspective. We need to create business pull, instead of tech-push.”

According to Rossi, when talking of language models, the algorithm itself is the data they use for training. “If we are talking about language models, what the system knows is what kind of text you have given the algorithm to study. Most of today’s leading algorithms are trained on Wikipedia articles and books. If they see a legal text, they get confused. What we as scientists want to know is if we can feed the algorithm with enough legal language so the algorithm will eventually understand that, without losing the meaning of words – synonyms, for example.” How Rossi’s team approaches this challenges will be discussed in this episode. So, do watch the video!

Studio Data Big Data Expo Data Science Text mining

Add new comment