AI · NLP · Speech · Systems Engineering

Samuel Oyerinde

AI Research Engineer · Low-Resource Language Technologies

I build data infrastructure and intelligent systems for underrepresented languages — spanning multilingual ASR/TTS, named-entity recognition, OCR for African scripts, and AI-powered education and fintech platforms.

Samuel Oyerinde — AI Research Engineer
Open to collaborations

About Me

Samuel Oyerinde

I am an AI research engineer working at the intersection of Natural Language Processing, Speech Technology, and Systems Engineering. My primary focus is building robust data infrastructure and AI systems for African and low-resource languages — languages that remain critically underserved by mainstream NLP research.

Beyond research, I architect production-grade platforms — from an AI-driven education intelligence system (edurepoAI) to a multi-tenant fintech SaaS (Quomoni) — demonstrating that rigorous research and real-world engineering are not mutually exclusive.

DomainLow-Resource Language AI
LanguagesYorùbá, Igbo, Luhya, Kamba, Gusii, Somali, Nigerian Pidgin
TasksASR · TTS · NER · OCR · Semantic Retrieval
StackPython · PyTorch · HuggingFace · PGVector · Next.js
InterestsAI for Africa · Education Intelligence · Fintech Systems
5,000+
Hours of speech data curated
20K+
NER annotation samples
96%
Inter-annotator agreement
1,100+
Institutions on edurepoAI

Research & Contributions

Multilingual Speech Corpus

ASR · TTS · Low-Resource

Led end-to-end curation, processing and uloading of 5,000+ hours of speech data across Luhya, Kamba, Gusii, and Somali — enabling the first production-grade low-resource ASR/TTS systems for these languages. Designed scalable audio-to-transcription pipelines and standardised QA frameworks.

5,000+ hrs4 LanguagesASR/TTSKenya

MasakhaNER — Nigerian Pidgin NER

NER · Masakhane Ecosystem

Annotated and curated 20,000+ Nigerian Pidgin samples for Named Entity Recognition as part of the Masakhane community effort. Maintained 96% inter-annotator agreement, contributing to the first high-quality NER benchmark for Nigerian Pidgin.

20K+ samples96% IAAMasakhanePidgin

Diacritics-Aware Yorùbá OCR

Computer Vision · Low-Resource OCR

Designed an end-to-end OCR pipeline for Yorùbá text recognition using PaddleOCR with human-in-the-loop correction workflows. Addressed systemic failures in tone-marked orthography recognition — a critical gap for Yorùbá digital preservation.

PaddleOCRHuman-in-LoopYorùbáTonal Scripts

Featured Projects

edurepoAI

AI-Powered Education Intelligence Platform

Integrates 1,100+ Nigerian institutions and 3,000+ academic programmes. Built with PGVector for semantic university/course retrieval, RAG-powered admission recommendations, predictive scoring with confidence estimation, and adaptive JAMB/WAEC CBT practice.

1,100+ institutions3,000+ programmesPredictive scoring
RAGPGVectorSemantic SearchNext.jsPostgres
Live

Quomoni (NextGen)

Multi-Tenant Financial SaaS Platform

Enterprise payroll engine with configurable pay groups, tax rules, pension, and deduction structures. Automated payslip generation, Paystack & Flutterwave payment integrations, full audit trail, and multi-tenant merchant dashboard.

Multi-tenantAutomated payrollPayment integrations
FintechSaaSPaystackFlutterwaveNode.js
Live

Yorùbá OCR Pipeline

Diacritics-Aware Optical Character Recognition

End-to-end OCR system for Yorùbá documents with PaddleOCR, human-in-the-loop correction, and structured dataset export. Addresses tone-mark recognition failures across printed and digitised Yorùbá texts.

Tone-mark awareHuman-in-the-loopStructured export
PaddleOCRComputer VisionPythonHuman-in-Loop
Live

Experience & Projects

Speech & Language Infrastructure

AI / Data Infrastructure Engineer

Research · Data EngineeringRecent
  • Led end-to-end curation, processing and uploading of 5,000+ hours of multilingual speech data across Luhya, Kamba, Gusii, and Somali for low-resource ASR and TTS systems.
  • Designed scalable audio-to-transcription pipelines and NLP preprocessing workflows handling malformed, and forced alignment.
  • Established standardised annotation protocols and QA frameworks, achieving consistent data quality across cross-lingual datasets.

Masakhane NER — Nigerian Pidgin

NLP Researcher / Annotator

NLP ResearchRecent
  • Annotated and curated 20,000+ Nigerian Pidgin samples for NER within the Masakhane community-driven research ecosystem.
  • Maintained 96% inter-annotator agreement (IAA) through rigorous guideline development and adjudication processes.
  • Advanced the first production-grade NER benchmark for Nigerian Pidgin via the MasakhaNER dataset release.

edurepoAI

Platform Architect

AI Platform · EducationPresent
  • Architecting an AI education platform integrating 1,100+ institutions and 3,000+ academic programmes, with semantic retrieval using PGVector and RAG.
  • Designed predictive admission scoring models with confidence estimation and explainability for student guidance.
  • Built adaptive JAMB/WAEC CBT practice systems with timed assessments, real-time analytics, and personalised learning paths.

Quomoni (NextGen)

Software Engineer

Fintech · SaaS EngineeringCompleted
  • Engineered a multi-tenant financial SaaS platform with merchant operations, payroll management, and enterprise reporting.
  • Designed configurable payroll engines with tax rules, benefits, and deduction structures with automated execution.
  • Integrated Paystack and Flutterwave payment gateways with reconciliation logic and complete audit trail.

Technical Skills

AI / NLP / Speech

PyTorch · HuggingFace · Kaldi · ESPnet · PaddleOCR

Language model training, ASR/TTS pipelines, NER, OCR for African scripts, and low-resource dataset engineering.

Data Infrastructure

PostgreSQL · PGVector · MongoDB · RAG · Airflow

Designing large-scale annotation pipelines, vector stores, semantic retrieval, and data QA frameworks.

Software Engineering

Python · TypeScript · Node.js · REST · gRPC

Clean, production-ready code for backends, APIs, and ML systems with strong architectural discipline.

Frontend & Full-Stack

React · Next.js · Tailwind CSS · Framer Motion

End-to-end product engineering — from research prototype to polished production web application.

Systems Architecture

SaaS · Microservices · Multi-tenant · Fintech APIs

Designing secure, scalable multi-tenant platforms with payment integrations and enterprise-grade reliability.

MLOps & DevOps

Git · Docker · CI/CD · Model Serving

Automating ML deployments, monitoring model drift, and maintaining engineering velocity in research projects.

Education

B.Sc. Computer Science

Federal University of Agriculture, Abeokuta (FUNAAB)

2018 – 2022
Upper Division Honours
Specialised in Artificial Intelligence, NLP, and Scalable Systems
AI thesis: Yorùbá news classification using Transformers vs. Traditional ML
Led technical study groups on ML and backend architecture
Coursework in Python, Java and Algorithms

Get in Touch

Let's Connect

Whether you're looking for a research collaborator, ML engineer, or a full-stack product builder — feel free to reach out.