TALEP - Traitement Automatique du Langage Écrit et Parlé

We organise seminars and discussions on themes related to NLP research, alternating between invited and local presentations. Historically, these seminars were dedicated to the team’s young researchers, thus the acronym JTT which stands for Jeunes Talents TALEP.

For the time being, presentations are hybrid, on zoom and on site in Luminy. If you would like to attend our seminars, get in touch. The seminar dates and times are also listed on TALEP’s Google agenda (ask for the link).

Upcoming

Past

Analyse morpho-syntaxique massivement multilingue à l’aide de ressources typologiques, d’annotations universelles et de plongements de mots multilingues
Manon Scholivet

Abstract: L’annotation de données est un problème majeur dans toutes les tâches d’apprentissage automatique. Dans le domaine du Traitement Automatique des Langues (TAL), ce problème est multiplié par le nombre de langues existantes. De nombreuses langues se retrouvent sans annotations, et sont alors mises à l’écart des systèmes de TAL. Une solution possible pour intégrer ces langues dans les systèmes est de tenter d’exploiter les langues disposant de nombreuses annotations, d’apprendre des informations sur ces langues bien dotées, et de transférer ce savoir vers les langues peu dotées. Pour cela, il est possible de se reposer sur des initiatives comme les Universal Dependencies, qui proposent un schéma d’annotation universel entre les langues. L’utilisation de plongements de mots multilingues et de traits typologiques issus de ressources comme le World Atlas of Language Structures (WALS) sont des solutions permettant un partage de connaissances entre les langues. Ces pistes sont étudiées dans le cadre de cette thèse, à travers la prédiction de l’analyse syntaxique, de la morphologie et des parties du discours sur 41 langues au total. Nous montrons que l’impact du WALS peut être positif dans un cadre multilingue, mais que son utilité n’est pas systématique dans une configuration d’apprentissage zero-shot. D’autres représentations des langues peuvent être apprises sur les données, et donnent de meilleurs résultats que le WALS, mais ont l’inconvénient de ne pas fonctionner dans un cadre de zero-shot. Nous mettons également en évidence l’importance de la présence d’une langue proche lors de l’apprentissage des modèles, ainsi que les problèmes liés à l’utilisation d’un modèle de caractère pour les langues isolées.

When: Oct 05, 2021 at 13:00 | Where: Zoom and Luminy | Language: French | Slides

Multi-SimLex: A Large-Scale Evaluation of Multilingual and Crosslingual Lexical Semantic Similarity
Thierry Poibeau

Abstract: We introduce Multi-SimLex, a large-scale lexical resource and evaluation benchmark covering data sets for 12 typologically diverse languages, including major languages (e.g., Mandarin Chinese, Spanish, Russian) as well as less-resourced ones (e.g., Welsh, Kiswahili). Each language data set is annotated for the lexical relation of semantic similarity and contains 1,888 semantically aligned concept pairs, providing a representative coverage of word classes (nouns, verbs, adjectives, adverbs), frequency ranks, similarity intervals, lexical fields, and concreteness levels. Additionally, owing to the alignment of concepts across languages, we provide a suite of 66 crosslingual semantic similarity data sets. Because of its extensive size and language coverage, Multi-SimLex provides entirely novel opportunities for experimental evaluation and analysis. On its monolingual and crosslingual benchmarks, we evaluate and analyze a wide array of recent state-of-the-art monolingual and crosslingual representation models, including static and contextualized word embeddings (such as fastText, monolingual and multilingual BERT, XLM), externally informed lexical representations, as well as fully unsupervised and (weakly) supervised crosslingual word embeddings. We also present a step-by-step data set creation protocol for creating consistent, Multi-Simlex–style resources for additional languages. We make these contributions—the public release of Multi-SimLex data sets, their creation protocol, strong baseline results, and in-depth analyses which can be helpful in guiding future developments in multilingual lexical semantics and representation learning—available via a Web site that will encourage community effort in further expansion of Multi-Simlex to many more languages. Such a large-scale semantic resource could inspire significant further advances in NLP across languages. Joint work with Ivan Vulić, Simon Baker, Edoardo Maria Ponti, Ulla Petti, Ira Leviant, Kelly Wing, Olga Majewska, Eden Bar, Matt Malone, Thierry Poibeau, Roi Reichart, Anna Korhonen.

When: Apr 29, 2021 at 13:00 | Where: Zoom and Luminy | Language: French | Slides

Upcoming

Past

March 14, 2024: Jules Cauzinille Investigating self-supervised speech models ability to classify animal vocalizations: The case of gibbon's vocal identity

Investigating self-supervised speech models ability to classify animal vocalizations: The case of gibbon's vocal identity Jules Cauzinille

March 07, 2024: Marion Ristorcelli Impact of the Nonverbal Behavior of Virtual Audience on Users’ Perception of Social Attitudes

Impact of the Nonverbal Behavior of Virtual Audience on Users’ Perception of Social Attitudes Marion Ristorcelli

February 22, 2024: Linnea Evanson Language acquisition: do children and language models follow similar learning stages?

Language acquisition: do children and language models follow similar learning stages? Linnea Evanson

February 15, 2024: Eliot Maës Information transfers in conversation, automatic detection using language models

Information transfers in conversation, automatic detection using language models Eliot Maës

January 25, 2024: Alice Delbosc Conflict management training through simulation with conversational agent - Progress and perspectives

Conflict management training through simulation with conversational agent - Progress and perspectives Alice Delbosc

January 11, 2024: Elie Antoine Complexity factor on the CALOR-QA corpus

Complexity factor on the CALOR-QA corpus Elie Antoine

December 07, 2023: Hossam Boudraa Étude du transfert d'information dans les conversations naturelles

Étude du transfert d'information dans les conversations naturelles Hossam Boudraa

November 02, 2023: Elodie Etienne Le Pouvoir du Machine Learning, du NLP, et de la Réalité Virtuelle pour la Prise de Parole en Public

Le Pouvoir du Machine Learning, du NLP, et de la Réalité Virtuelle pour la Prise de Parole en Public Elodie Etienne

October 26, 2023: Jeremy Auguste In real-time risk analysis on open sources

In real-time risk analysis on open sources Jeremy Auguste

October 19, 2023: Abdellah Fourtassi Presentation of the ChiCA corpus.

Presentation of the ChiCA corpus. Abdellah Fourtassi

October 12, 2023: Alexis Nasr ANR COMPO & ANR HEBBIAN

ANR COMPO & ANR HEBBIAN Alexis Nasr

September 28, 2023: Benoit Favre LIS@DEFT'23 Can LLMs repond to MCQs? (a) yes; (b) no; (c) I don't know.

LIS@DEFT'23 Can LLMs repond to MCQs? (a) yes; (b) no; (c) I don't know. Benoit Favre

June 29, 2023: Marion Ristorcelli Learning and Assessment of public speaking in virtual reality. An overview of my thesis topic.

Learning and Assessment of public speaking in virtual reality. An overview of my thesis topic. Marion Ristorcelli

June 22, 2023: Alice Delbosc Génération automatique des comportements faciaux : des données à l'évaluation

Génération automatique des comportements faciaux : des données à l'évaluation Alice Delbosc

May 25, 2023: Carol Figueroa Classifying feedback communicative functions.

Classifying feedback communicative functions. Carol Figueroa

May 11, 2023: Emmanuelle Salin Towards a better understanding of vision-language transformer models.

Towards a better understanding of vision-language transformer models. Emmanuelle Salin

April 20, 2023: Elie Antoine Exploring Social Sciences Archives with Explainable Document Linkage through Question Generation

Exploring Social Sciences Archives with Explainable Document Linkage through Question Generation Elie Antoine

April 13, 2023: Dhia Elhak Goumri Automatic detection of children's communicative signals in video calls.

Automatic detection of children's communicative signals in video calls. Dhia Elhak Goumri

April 06, 2023: Susana Campillo A formal linguistic approach to hate speech detection.

A formal linguistic approach to hate speech detection. Susana Campillo

March 30, 2023: Marjorie Armando Improving children's math performance with a virtual role model against Stereotype Threat effects.

Improving children's math performance with a virtual role model against Stereotype Threat effects. Marjorie Armando

March 23, 2023: Hee-Soo Choi Analyse orientée corpus d'universaux de Greenberg sur Universal Dependencies

Analyse orientée corpus d'universaux de Greenberg sur Universal Dependencies Hee-Soo Choi

March 02, 2023: Abdellah Fourtassi Understanding Children's Multimodal Conversational Development: Challenges and Opportunities.

Understanding Children's Multimodal Conversational Development: Challenges and Opportunities. Abdellah Fourtassi

February 09, 2023: Géraldine Damnati Premières évaluations et nouveaux use-cases, quelles limitations et quelles opportunités pour les Large Language Models en contexte opérationnel ?

Premières évaluations et nouveaux use-cases, quelles limitations et quelles opportunités pour les Large Language Models en contexte opérationnel ? Géraldine Damnati

February 02, 2023: Santiago Cuervo Variable-rate hierarchical representation learning

Variable-rate hierarchical representation learning Santiago Cuervo

January 12, 2023: Léo Jacqmin LIS and Orange at the Dialog Systems Technology Challenge (DSTC11)

LIS and Orange at the Dialog Systems Technology Challenge (DSTC11) Léo Jacqmin

January 05, 2023: Maria Boritchev Compositionality and logic in language

Compositionality and logic in language Maria Boritchev

December 01, 2022: Alex Warstadt Artificial neural networks as models of human language learning

Artificial neural networks as models of human language learning Alex Warstadt

November 24, 2022: Laurianne Sitbon Exploring accessible modalities for accessible conversational interactions

Exploring accessible modalities for accessible conversational interactions Laurianne Sitbon

November 17, 2022: Mariya Toneva Why do large language models align with human brains: insights, opportunities, and challenges

Why do large language models align with human brains: insights, opportunities, and challenges Mariya Toneva

November 10, 2022: Thomas Hueber Modeling speech acquisition using self-supervised machine learning, a focus on the acoustic-to-articulatory mapping

Modeling speech acquisition using self-supervised machine learning, a focus on the acoustic-to-articulatory mapping Thomas Hueber

November 03, 2022: Rahma Chaabouni Emerging linguistic universals in communicating neural network agents

Emerging linguistic universals in communicating neural network agents Rahma Chaabouni

October 20, 2022: Lukas Galke Structure in language acquisition models

Structure in language acquisition models Lukas Galke

October 13, 2022: Partha Pakray Quantum Machine Learning, Cybersecurity, Gender Equality and Gender Bias

Quantum Machine Learning, Cybersecurity, Gender Equality and Gender Bias Partha Pakray

October 06, 2022: Jules Cauzinille Self-supervised representation learning of primate vocalisations

Self-supervised representation learning of primate vocalisations Jules Cauzinille

September 29, 2022: Salima Mdhaffar End-to-end model for named entity recognition from speech without paired training data

End-to-end model for named entity recognition from speech without paired training data Salima Mdhaffar

September 22, 2022: Francesco Cabiddu Testing the Developmental Plausibility of BERT by Capturing the Role of Verb-Event Structure in Early Word Sense Disambiguation

Testing the Developmental Plausibility of BERT by Capturing the Role of Verb-Event Structure in Early Word Sense Disambiguation Francesco Cabiddu

September 08, 2022: Dhia Elhak Goumri Brain basis of turn-taking in natural conversation.

Brain basis of turn-taking in natural conversation. Dhia Elhak Goumri

July 07, 2022: Denis Paperno To the limits of distributional semantics and beyond

To the limits of distributional semantics and beyond Denis Paperno

June 16, 2022: Eunice Akani Abstraction ou hallucination ? État des lieux et évaluation du risque pour les modèles de génération de résumés automatiques de type séquence-à-séquence

Abstraction ou hallucination ? État des lieux et évaluation du risque pour les modèles de génération de résumés automatiques de type séquence-à-séquence Eunice Akani

Investigating self-supervised speech models ability to classify animal vocalizations: The case of gibbon's vocal identity
Jules Cauzinille

Impact of the Nonverbal Behavior of Virtual Audience on Users’ Perception of Social Attitudes
Marion Ristorcelli

Language acquisition: do children and language models follow similar learning stages?
Linnea Evanson

Information transfers in conversation, automatic detection using language models
Eliot Maës

Conflict management training through simulation with conversational agent - Progress and perspectives
Alice Delbosc

Complexity factor on the CALOR-QA corpus
Elie Antoine

Étude du transfert d'information dans les conversations naturelles
Hossam Boudraa

Le Pouvoir du Machine Learning, du NLP, et de la Réalité Virtuelle pour la Prise de Parole en Public
Elodie Etienne

In real-time risk analysis on open sources
Jeremy Auguste

Presentation of the ChiCA corpus.
Abdellah Fourtassi

ANR COMPO & ANR HEBBIAN
Alexis Nasr

LIS@DEFT'23 Can LLMs repond to MCQs? (a) yes; (b) no; (c) I don't know.
Benoit Favre

Learning and Assessment of public speaking in virtual reality. An overview of my thesis topic.
Marion Ristorcelli

Génération automatique des comportements faciaux : des données à l'évaluation
Alice Delbosc

Classifying feedback communicative functions.
Carol Figueroa

Towards a better understanding of vision-language transformer models.
Emmanuelle Salin

Exploring Social Sciences Archives with Explainable Document Linkage through Question Generation
Elie Antoine

Automatic detection of children's communicative signals in video calls.
Dhia Elhak Goumri

A formal linguistic approach to hate speech detection.
Susana Campillo

Improving children's math performance with a virtual role model against Stereotype Threat effects.
Marjorie Armando

Analyse orientée corpus d'universaux de Greenberg sur Universal Dependencies
Hee-Soo Choi

Understanding Children's Multimodal Conversational Development: Challenges and Opportunities.
Abdellah Fourtassi

Premières évaluations et nouveaux use-cases, quelles limitations et quelles opportunités pour les Large Language Models en contexte opérationnel ?
Géraldine Damnati

Variable-rate hierarchical representation learning
Santiago Cuervo

LIS and Orange at the Dialog Systems Technology Challenge (DSTC11)
Léo Jacqmin

Compositionality and logic in language
Maria Boritchev

Artificial neural networks as models of human language learning
Alex Warstadt

Exploring accessible modalities for accessible conversational interactions
Laurianne Sitbon

Why do large language models align with human brains: insights, opportunities, and challenges
Mariya Toneva

Modeling speech acquisition using self-supervised machine learning, a focus on the acoustic-to-articulatory mapping
Thomas Hueber

Emerging linguistic universals in communicating neural network agents
Rahma Chaabouni

Structure in language acquisition models
Lukas Galke

Quantum Machine Learning, Cybersecurity, Gender Equality and Gender Bias
Partha Pakray

Self-supervised representation learning of primate vocalisations
Jules Cauzinille

End-to-end model for named entity recognition from speech without paired training data
Salima Mdhaffar

Testing the Developmental Plausibility of BERT by Capturing the Role of Verb-Event Structure in Early Word Sense Disambiguation
Francesco Cabiddu

Brain basis of turn-taking in natural conversation.
Dhia Elhak Goumri

To the limits of distributional semantics and beyond
Denis Paperno

Abstraction ou hallucination ? État des lieux et évaluation du risque pour les modèles de génération de résumés automatiques de type séquence-à-séquence
Eunice Akani

Étiquetage ou génération de séquences pour la compréhension automatique du langage en contexte d'interaction?
Rim Abrougui

Tâches auxiliaires pour l’analyse vers graphes de dépendances
Marie Candito

Représentation multimodale de conversations pour la détection de messages abusif
Richard Dufour

Séminaire du pôle SD: A quick tour: Neural Network Interpretability
Hanwei Zhang

Séminaire du pôle SD: The Many Flavours of CAM
Felipe Torres Figueroa

Séminaire du pôle SD: Interpretable RNNs
Hamed Benazha

Présentation et brainstorming autour du robot Furhat
Magalie Ochs

Expressions multi-mots et acquisition du langage
Leonardo Pinto-Arata

Automatic analysis of errors in automatic speech recognition systems from end-users reception
Thibault Bañeras Roux

Assessing the ability of neural language models to abstract syntactic representation: an analysis based on French long-distance agreement
Bingzhi Li

Interprétabilité A Priori et Explicabilité A Posteriori dans le Traitement Automatique des Langues
Tom Bourgeade

De CALOR-QUEST à CALOR-DIAL
Frédéric Béchet

Apprendre à renoncer: apprentissage de retour arrière dans un système d'analyse glouton
Alexis Nasr

A Multimodal Corpus for the Study of Child Conversation
Abdellah Fourtassi

Speech @ BigScience - Analyse syntaxique de la parole
Benoit Favre, Franck Dary

Projet ANR SELEXINI : Semantic Lexicon Induction for Interpretability and Diversity in Text Processing
Carlos Ramisch

Projet ANR REVITALISE : viRtual bEhaVioral skIlls TrAining for pubLIc SpEaking
Magalie Ochs

Suivi de l'état du dialogue : passé, présent et futur
Léo Jacqmin

Summarizing scientific papers given user-desired queries in zero-shot context
Amir Soleimani

Social media data in public health research, two cases of study
Raquel Urena

Hate speech target identification and characterization
Anaïs Ollagnier

CoCoDev project
Abdellah Fourtassi

Giving Out or Happy Out? Processing Multiword Expressions in Irish
Abigail Walsh

Zero-shot and Few-shot documents classification in biomedical domain.
Simon Lupart

Probing joint vision-and-language representations
Badreddine Farah

Analyse morpho-syntaxique massivement multilingue à l’aide de ressources typologiques, d’annotations universelles et de plongements de mots multilingues
Manon Scholivet

Models and Resources for Attention-based Unsupervised Word Segmentation
Marcely Zanon Boito

Learning and Processing Language from Wearables: Opportunities and Challenges (dry run of ACL keynote)
Alejandrina Cristia

Why are GPUs faster than CPUs for the matrix calculations of deep learning libraries?
Laércio Pilla

A Fuzzy Sociolinguistic Model for Gender Prediction in Spanish Social Network Texts
Damián Morales

Génération automatique de questions et capacité de généralisation des modèles de compréhension automatique de documents
Elie Antoine

Génération automatique de résumés critiques d’articles à but de veille médicale
Loïc Neyrat

An empirical study of domain adaptation for named entity recognition on historical documents
Baptiste Blouin

Multiword Expression Features for Automatic Hate Speech Detection
Nicolas Zampieri