FuzzyTM: a Python package for state-of-the-art Topic Modeling

WED 15:30 - 16:00

This is a 2022 presentation

We present FuzzyTM, a Python package for state-of-the-art topic modeling. Topic modeling is a popular task in natural language processing that aims to find the hidden topics in a corpus.

Recently, my group and I have developed new algorithms, of which FLSA-W outperforms other algorithms (e.g. LDA, NMF and ProdLDA) on various open datasets in terms of coherence-, diversity- and interpretability scores. In addition to the three algorithms (FLSA-W, FLSA-V and FLSA), FuzzyTM features various other functionalities (e.g. the calculation of performance metrics and the creation of document embeddings).

Also, the user-friendly pipelines with default values allow practitioners to train a topic model with minimal effort. Meanwhile, its modular design allows researchers to modify each software element and for future methods to be
added.

During the presentation, I will explain intuitively how these algorithms work, share experimental results and demonstrate how easy and quick FuzzyTM works.