Data Scientist & Analytics Leader
Founder, YÉNI & Omoyeni.io · ex-Amazon
Houston, TX
Results-driven Data Scientist and Analytics Leader with 6+ years of experience delivering end-to-end data solutions across machine learning, business intelligence, and data engineering. Expert at transforming complex data into measurable outcomes across finance, marketing, healthcare, e-commerce, and enterprise technology — from predictive modeling and experimentation to scalable ETL pipelines and executive-facing BI.
Built a production-grade PySpark machine learning pipeline to predict user re-engagement for a consumer product operating at hundreds of millions of users. The model ingested over 150 predictor variables spanning behavioral signals — session frequency, feature interaction depth, content engagement patterns, notification response rates, cross-device activity, and dormancy duration — combined with user lifecycle attributes, historical retention markers, and product usage cadence indicators.
Engineering the feature set at this scale required building distributed Spark transformations across a massive data lake, including custom aggregation windows across multiple time horizons, lag features capturing behavioral decay, and robust imputation strategies for sparse signals across infrequent users. The complexity was significant: any naive join or aggregation at this data volume would collapse; every transformation had to be written with partitioning and execution plan efficiency in mind. Multiple classification algorithms were benchmarked — Gradient Boosted Trees, Random Forest, Logistic Regression, and XGBoost — with the final tuned ensemble achieving 93% accuracy with strong precision-recall balance to avoid over-notifying users with low re-engagement probability. Model outputs fed directly into the churn reduction strategy, informing targeting logic for re-engagement campaigns and contributing to a 14% reduction in user churn.
Built the organization's single source of truth for product metrics — a mega-dashboard serving daily, weekly, and monthly performance tracking for the entire product org and leadership. The challenge wasn't just visualization; it was the serious analytical engineering underneath it. The data was spread across dozens of disparate tables across multiple pipelines, requiring complex multi-table joins, careful deduplication, incremental aggregation logic, and layered data transformations to produce a unified, consistent dataset.
Built the entire data layer from scratch — designing intermediate staging tables, writing optimized SQL transformations that could run efficiently at org scale without query timeouts or data freshness lag, and engineering dbt models to handle the dependency chain reliably. The final dashboard consolidated active users, engagement depth, retention curves, feature adoption, and content metrics across daily, weekly, and monthly time grains — each with consistent metric definitions so that numbers never conflicted between views. Leadership used this as the primary lens for product reviews, roadmap decisions, and organizational health assessment. It became the trusted, go-to view that replaced ad hoc reporting across the org.
Designed and built a Tableau dashboard that became the single source of truth for product health and A/B experimentation across a large consumer product. Before this, evaluating feature performance required ad hoc SQL queries routed through analysts, creating multi-day delays and inconsistent metric definitions that led to conflicting interpretations of the same experiment across teams.
The dashboard centralized KPI scorecards (DAU, MAU, engagement, retention), variant-level experiment comparisons with statistical significance indicators, funnel analysis across key user journeys, and cohort views for post-launch behavioral tracking. On the backend, built optimized SQL pipelines aggregating data from multiple sources into clean, analysis-ready tables — and implemented standardized experiment metric logic so that teams were always comparing apples to apples. Product managers could evaluate launches in near real-time and decide to scale, iterate, or roll back within minutes instead of days. The experimentation views became a fixture in weekly product reviews, and self-serve adoption across the org increased significantly.
Built a first-touch and last-touch attribution framework in SQL to map how users across a digital product with over 10 million users discovered and adopted key features. First-touch attribution identified the original acquisition channel or entry point that introduced a user to the product — critical for understanding which awareness channels drove the highest-quality, highest-retention users. Last-touch captured the final interaction before a meaningful conversion event, revealing which touchpoints were actually closing adoption decisions.
The model was built on top of a full user event stream, requiring careful session reconstruction, deduplication logic, and handling of multi-device journeys where a single user might interact across web, mobile, and embedded surfaces. The output enabled marketing and product teams to accurately assess channel efficiency, reallocate acquisition spend toward highest-performing sources, and identify friction points in the adoption funnel where users were dropping off before converting — directly informing campaign strategy and onboarding redesigns.
Built a robust, production-grade pricing optimization model for a global enterprise — one that was not only technically rigorous but also designed around legally acceptable, equitable pricing principles. The model was built on behavioral data that met strict fairness and compliance requirements, augmented with external market data, competitor pricing signals, inflation indices, and macroeconomic indicators to create a multi-dimensional view of pricing conditions. The goal was dual: maximize company profitability while simultaneously preserving customer satisfaction and long-term retention.
Experimented extensively with tree-based and ensemble approaches — Decision Trees, Random Forest, Gradient Boosted Trees, XGBoost, and LightGBM — evaluating each on both predictive accuracy and interpretability to ensure the pricing logic could be explained and audited. The final model achieved 94% accuracy in predicting the price at which individual customers were willing to purchase, enabling truly personalized pricing at scale. The model was then operationalized by scaling it and ingesting it directly into a Power BI environment, building a dashboard that allowed commercial and finance teams to run pricing scenarios, simulate revenue outcomes, and deploy pricing decisions in real time. Following A/B testing of the new pricing strategy against a control group, results showed a 15% increase in revenue, a significant increase in total orders and order rate, and measurable improvement in customer satisfaction scores related to perceived pricing fairness.
Built the core AI engine powering Omoyeni.io — an end-to-end generative AI system that takes a photo of a user's physical space as input and produces a fully redesigned or newly designed version of that space based on their style preferences and interior design specifications. The pipeline begins with image understanding: a computer vision model analyzes the uploaded photo to detect spatial structure, existing furniture placement, lighting conditions, room dimensions, and architectural elements — forming a semantic map of the space.
This spatial understanding is combined with the user's stated design preferences — style (minimalist, maximalist, Scandinavian, industrial, etc.), color palette, material preferences, and budget tier — to condition a fine-tuned diffusion model that generates photorealistic redesign outputs. The generative backbone leverages Stable Diffusion with ControlNet conditioning to preserve the room's structural geometry while replacing and redesigning surfaces, furniture, lighting, and décor. Style transfer and inpainting techniques allow targeted redesign of specific zones without disrupting the overall spatial layout. The output image is matched against a product catalog using visual similarity search (CLIP embeddings + vector database) to surface shoppable furniture and décor items that match the generated design, which users can add directly to their cart. A Tasker service integration completes the loop — users can book a professional to physically implement the design, making the platform a true end-to-end interior design experience from inspiration to installation.
Developed a Power BI application for Customer Pulse Analytics that used NLP to automatically classify customer reviews and feedback as positive or negative in real time. The dashboard included sentiment analytics, chatbot interaction analytics, and review trend tracking — all refreshing daily via an automated pipeline so the team could monitor customer perception continuously rather than reactively.
The system enabled the team to see analyses of customer reviews daily, implement data-driven decisions in real time, and respond to emerging sentiment shifts before they escalated. Proactive monitoring of customer perception ensured product and support decisions were grounded in live feedback rather than lagging reports — directly contributing to a 10% improvement in platform daily active users.
Built a collaborative filtering recommendation system from scratch in Python for a platform with a growing but not yet large customer base — making standard collaborative filtering challenging due to data sparsity. Solved the cold-start problem by constructing a user-item interaction matrix representing all historical user-booking relationships, then computing Pearson correlation coefficients to identify similarity between users based on behavioral patterns. Neighbourhood selection identified subsets of the most similar users, which powered personalized recommendations for both existing and new users.
After deployment, ran a rigorous A/B test using simple random sampling and stratified cookies to ensure clean group separation — control group saw the previous system, test group experienced the new recommendation engine. Measured performance across engagement metrics including time on site, click-through rate, and purchase rate. Resulted in a 30% increase in user engagement and a 15% improvement in conversion rates.
Applied multiple ML algorithms to predict stock price movements. Complete pipeline from data acquisition through feature engineering to model evaluation and comparison.
Predicted customer churn for a music streaming platform (Sparkify) using distributed PySpark infrastructure. Scalable feature engineering handles massive datasets for production-grade retention insights.
Multi-class text classifier using NLTK and Scikit-learn that categorizes emergency messages and routes them to the correct response team in real time.
Random Forest and XGBoost regression models for Seattle Airbnb pricing. Deep EDA uncovered seasonal trends and neighborhood dynamics; feature engineering drove a 10% MAE improvement.
Deep learning classification model for insurance fraud. Achieved an 18% improvement in detection precision through architecture tuning and advanced feature construction.
In-depth statistical analysis examining how rising interest rates impact fixed income instruments and portfolios. Includes quantitative modeling, scenario analysis, and research presentation.
Investigates the integration of ML algorithms — Random Forests, XGBoost, and Deep Neural Networks — for real-time risk assessment in high-frequency financial trading. Improves identification and mitigation of risks associated with financial market fluctuations by analyzing historical price trends, trading volume anomalies, and market volatility indices.
↗ Read on ResearchGateProposes a novel approach to quantify information transfer between financial systems using Transfer Entropy and Kullback-Leibler divergence measures, enabling deeper understanding of inter-system data flow dynamics. Cited in subsequent research on statistical inference applications in both business and medical contexts.
↗ Read on ResearchGateExamines how ML, NLP, and predictive analytics address financial security challenges in emerging markets — including vulnerability to fraud, regulatory compliance, and financial instability — through comprehensive case-study review, highlighting benefits, limitations, and future directions for AI-driven financial security.
↗ Read on ResearchGateI write on Medium about data science, machine learning, analytics engineering, and the intersection of data with real-world business decisions — practical perspectives from 6+ years in the field.
→ Read on MediumSupervised & unsupervised learning, software engineering, deep learning, data engineering, A/B testing, experimental design, and recommendation systems. Built end-to-end ETL, NLP, and ML pipelines on real-world datasets.
Strong quantitative and analytical foundation in financial systems, accounting principles, and business economics — directly informing work at the intersection of data science and financial modeling.
Open to data science collaborations, consulting engagements, research partnerships, and conversations about how analytics can drive meaningful impact.
Founder & Entrepreneur
Alongside a career in data science, Omoyeni has founded and co-founded multiple companies — from multi-chain brick-and-mortar businesses to AI-powered software platforms. Each venture is rooted in the same belief: that good data and bold execution create lasting value.
From healthy food chains and logistics businesses to AI-powered immigration tools and luxury consumer brands, Omoyeni has built across industries with a common thread — data-informed strategy and product thinking.
A luxury consumer brand founded and led by Omoyeni. YÉNI represents a bold intersection of identity, design, and quality — built for a discerning audience and powered by data-driven marketing and product strategy.
↗ shopyeni.comAn innovative AI interior design platform where users upload photos of their space and receive AI-generated design and redesign concepts tailored to their style. What makes Omoyeni.io truly unique is its end-to-end experience — users can browse the design, add furniture and décor items directly to their cart from within the platform, and book a Tasker service to physically bring the design to life. A genuine one-stop shop from inspiration to installation.
↗ omoyeni.ioCo-founded an AI-powered immigration platform designed to simplify and guide users through complex visa and immigration processes. Visa Companion leverages intelligent automation and natural language interfaces to make immigration more accessible and less overwhelming.
↗ visacompanion.aiFounded AI Pathfinder to help individuals and organizations navigate the rapidly evolving AI and data science landscape. The platform provides education, guidance, and resources to empower people to build meaningful careers and capabilities in artificial intelligence.
↗ aipathfinder.ioFounded and scaled a multi-chain enterprise across three distinct consumer verticals, all operating profitably before a successful sale in 2021. Stanwick Enterprise demonstrated Omoyeni's ability to build, operate, and exit businesses at the intersection of consumer demand and operational excellence.