Data Scientist & ML Researcher | Innovating with LLMs, RAG, GenAI and Agent-based AI Systems.
I am a passionate and results-driven Data Scientist and Machine Learning Engineer with a solid academic background and hands-on experience in AI, Data Science, and Cybersecurity. I am currently a **PhD Candidate at UPEC (LACL Lab)**, focusing on the formalization of reliable agentic AI.
Holding a Master's degree in Machine Learning and Artificial Intelligence, I have developed expertise in designing and deploying AI-driven solutions, from deep learning and generative models to natural language processing (NLP) and reinforcement learning.
I am particularly interested in leveraging large language models (LLMs), transformers, and RAG for creating intelligent, scalable solutions. As a researcher in agent-based AI, I explore frameworks like Autogen and CrewAI and formal methods like **TLA+** to build adaptive, reliable agent systems and "vibe coding" environments.
From core machine learning to advanced generative AI systems.
From research labs to industrial data science projects.
Designed an NLP-powered pipeline to extract, normalize, and cross-reference multi-source alerts data for CMDB construction. Stored and visualized processed outputs in Elasticsearch and Padaone. Created optimized SQL views for Data Warehouse integration and modeled data structures in Power BI for operational dashboards. Participated in data quality workshops, defining KPIs and monitoring indicators.
Exploratory study on LLM-based agents and their coordination in multi-agent systems. Formalizing agent behaviors using TLA+ and finite state machines. Developing experimental prototypes for multi-agent simulation to evaluate consistency and reliability. Authoring comprehensive technical reports and scientific articles on LLM agent architectures.
Led analytics and automation projects aimed at improving regulatory reporting. Developed ML models to detect data quality patterns and outliers. Built end-to-end ETL workflows with Talend, integrating complex datasets from JDE and AS400. Designed interactive dashboards in QlikSense and RStudio.
Designed an unsupervised text classification method for industrial documents. Built an end-to-end pipeline combining NLP preprocessing with clustering algorithms (K-Means, DBSCAN). Explored word embeddings (TF-IDF, Word2Vec) and dimensionality reduction (PCA, t-SNE).
Optimized neural machine translation (NMT) training data selection using co-clustering. Trained and fine-tuned JoeyNMT Transformer models. Developed an interactive Dash dashboard for corpus exploration.
Analysis, design, and development of a business management software package (ERP), website, and e-commerce application.
Thesis: "Formalization of an Evolutionary and Reliable Agentic Artificial Intelligence". Focus on LLM-based agents, formal specification (TLA+), and multi-agent coordination within the LACL research lab.
PhD CandidateOrganized by U. Luxembourg, U. Liège, Inria, and Max-Planck-Institut. Topics: Distributed systems modeling (TLA+), formal specification, model checking, and reactive systems synthesis. Hands-on tools: **Ultimate, TLA+, Issy, and Vampire**.
Advanced TrainingDevoted to the interplay between AI and verification. Focus on making "vibe coding" efficient and reliable. Currently preparing articles for **Springer Lecture Notes in Computer Science (LNCS)**. Topics include certification of AI-produced software and hallucination management.
Research & PublicationSpecialized in LLMs, Transformers, Deep Learning, NLP, and Reinforcement Learning. Graduated with Honors (15.5/20).
Master's DegreeStrong foundations in cryptography, network security, penetration testing, and secure software development.
Master's DegreeSoftware engineering, distributed databases, and mathematical optimization. Ranked top of the class.
Bachelor's DegreeTheoretical foundations, algorithm complexity, and language theory.
Bachelor's DegreeDeep Learning algorithm to classify medical entities from natural language using TensorFlow and Keras. Predicted ICD-10 codes from pathologies with high accuracy.
Micro-services architecture based on Spring Boot and REST API for comprehensive library management.
Comparative study of matrix factorization techniques vs neural network-based embeddings for similarity tasks.
Participated in the first VERIFAI workshop (CNRS). Currently preparing an article for **Springer Lecture Notes in Computer Science (LNCS)** on efficient and reliable "vibe coding" through verification techniques.
"Modélisation et Analyse des Exécutions Non-Déterministes des Agents basés sur les LLM via les Automates Finis", A. F. Sanou, Y. Badr, F. Mourlin, 2025.
"Veille technologique sur les Systèmes Multi-Agents (MAS) et leur intégration avec les LLM", Technical Report, February 2025.
Open to Data Science and Machine Learning opportunities. Based in Paris, France.