
Who has a voice in the media?
About this project
- Python, Javascript, HTML.
- Statistical data analysis, supervised and unsupervised learning.
- Packages/Libraries: Pandas, skLearn, plotly, TensorFlow, ReactJS, Flask.
This project was done in the course CS-401 Applied Data Analysis course at EPFL by Pr. West.
Made in collaboration with Eliott Zemour, Thomas Benchetrit, and Benjamin Hansson.
Isn’t it a pleasure to be listened to? The ability to make your voice heard is a privilege that few have. Sometimes you can have the feeling that only the loudest are listened to. Using the Quotebank dataset from 2015 to 2020 and information about the speakers exctracted from Wikidata, we were able to dissect :
- WHO you need to be to be quoted (age, gender, occupation)
- WHAT you need to say (which subject to talk about)
- HOW you need to talk about it (which emotion to use)
Once a primary analysis was done on the speakers on themselves, a K-means clustering algorithm was run on the data to cluster the speakers into sub-groups to be further and deeper analyzed. The lexical database Wordnet was used to reduce the number of occupations to facilitate clustering. Then, in order to extract the topic and the emotion for each quote, we used a zero-shot classificatino approach based on the DistilBERT base model uncased. This model is fine-tuned on Multi-Genre Natural Language Inference (MNLI) dataset for the zero-shot classification task. In order to present our results, a website was developped using Jekyll. All visualisations were done through plotly. To make the content more interactive and more appealing to users, a webapp was also developped.
Developped using ReactJS and Material-UI, this app asks you who you are, what you want to talk about and with which emotion in order to predict a quotation score which is computed using a Deep Learning model made with TensorFlow and trained on the newly-labelled QuoteBank dataset. An API was developped with the Flask framework in order to host the predictive model and answer the requests made in the webapp.
Links
- Website.
- Webapp.
- Repositories.
- You can find the QuoteBank dataset here.