Samuel Oyerinde
AI Research Engineer · Low-Resource Language Technologies
I build data infrastructure and intelligent systems for underrepresented languages — spanning multilingual ASR/TTS, named-entity recognition, OCR for African scripts, and AI-powered education and fintech platforms.

About Me

I am an AI research engineer working at the intersection of Natural Language Processing, Speech Technology, and Systems Engineering. My primary focus is building robust data infrastructure and AI systems for African and low-resource languages — languages that remain critically underserved by mainstream NLP research.
Beyond research, I architect production-grade platforms — from an AI-driven education intelligence system (edurepoAI) to a multi-tenant fintech SaaS (Quomoni) — demonstrating that rigorous research and real-world engineering are not mutually exclusive.
Research & Contributions
Multilingual Speech Corpus
ASR · TTS · Low-ResourceLed end-to-end curation, processing and uloading of 5,000+ hours of speech data across Luhya, Kamba, Gusii, and Somali — enabling the first production-grade low-resource ASR/TTS systems for these languages. Designed scalable audio-to-transcription pipelines and standardised QA frameworks.
MasakhaNER — Nigerian Pidgin NER
NER · Masakhane EcosystemAnnotated and curated 20,000+ Nigerian Pidgin samples for Named Entity Recognition as part of the Masakhane community effort. Maintained 96% inter-annotator agreement, contributing to the first high-quality NER benchmark for Nigerian Pidgin.
Diacritics-Aware Yorùbá OCR
Computer Vision · Low-Resource OCRDesigned an end-to-end OCR pipeline for Yorùbá text recognition using PaddleOCR with human-in-the-loop correction workflows. Addressed systemic failures in tone-marked orthography recognition — a critical gap for Yorùbá digital preservation.
Featured Projects
edurepoAI
— AI-Powered Education Intelligence PlatformIntegrates 1,100+ Nigerian institutions and 3,000+ academic programmes. Built with PGVector for semantic university/course retrieval, RAG-powered admission recommendations, predictive scoring with confidence estimation, and adaptive JAMB/WAEC CBT practice.
Quomoni (NextGen)
— Multi-Tenant Financial SaaS PlatformEnterprise payroll engine with configurable pay groups, tax rules, pension, and deduction structures. Automated payslip generation, Paystack & Flutterwave payment integrations, full audit trail, and multi-tenant merchant dashboard.
Yorùbá OCR Pipeline
— Diacritics-Aware Optical Character RecognitionEnd-to-end OCR system for Yorùbá documents with PaddleOCR, human-in-the-loop correction, and structured dataset export. Addresses tone-mark recognition failures across printed and digitised Yorùbá texts.
Experience & Projects
Speech & Language Infrastructure
AI / Data Infrastructure Engineer
- Led end-to-end curation, processing and uploading of 5,000+ hours of multilingual speech data across Luhya, Kamba, Gusii, and Somali for low-resource ASR and TTS systems.
- Designed scalable audio-to-transcription pipelines and NLP preprocessing workflows handling malformed, and forced alignment.
- Established standardised annotation protocols and QA frameworks, achieving consistent data quality across cross-lingual datasets.
Masakhane NER — Nigerian Pidgin
NLP Researcher / Annotator
- Annotated and curated 20,000+ Nigerian Pidgin samples for NER within the Masakhane community-driven research ecosystem.
- Maintained 96% inter-annotator agreement (IAA) through rigorous guideline development and adjudication processes.
- Advanced the first production-grade NER benchmark for Nigerian Pidgin via the MasakhaNER dataset release.
edurepoAI
Platform Architect
- Architecting an AI education platform integrating 1,100+ institutions and 3,000+ academic programmes, with semantic retrieval using PGVector and RAG.
- Designed predictive admission scoring models with confidence estimation and explainability for student guidance.
- Built adaptive JAMB/WAEC CBT practice systems with timed assessments, real-time analytics, and personalised learning paths.
Quomoni (NextGen)
Software Engineer
- Engineered a multi-tenant financial SaaS platform with merchant operations, payroll management, and enterprise reporting.
- Designed configurable payroll engines with tax rules, benefits, and deduction structures with automated execution.
- Integrated Paystack and Flutterwave payment gateways with reconciliation logic and complete audit trail.
Technical Skills
AI / NLP / Speech
PyTorch · HuggingFace · Kaldi · ESPnet · PaddleOCR
Language model training, ASR/TTS pipelines, NER, OCR for African scripts, and low-resource dataset engineering.
Data Infrastructure
PostgreSQL · PGVector · MongoDB · RAG · Airflow
Designing large-scale annotation pipelines, vector stores, semantic retrieval, and data QA frameworks.
Software Engineering
Python · TypeScript · Node.js · REST · gRPC
Clean, production-ready code for backends, APIs, and ML systems with strong architectural discipline.
Frontend & Full-Stack
React · Next.js · Tailwind CSS · Framer Motion
End-to-end product engineering — from research prototype to polished production web application.
Systems Architecture
SaaS · Microservices · Multi-tenant · Fintech APIs
Designing secure, scalable multi-tenant platforms with payment integrations and enterprise-grade reliability.
MLOps & DevOps
Git · Docker · CI/CD · Model Serving
Automating ML deployments, monitoring model drift, and maintaining engineering velocity in research projects.
Education
B.Sc. Computer Science
Federal University of Agriculture, Abeokuta (FUNAAB)
Get in Touch
Let's Connect
Whether you're looking for a research collaborator, ML engineer, or a full-stack product builder — feel free to reach out.