Data Scientist & Analytics Leader

Omoyeni
Ogundipe

Founder, YÉNI & Omoyeni.io · ex-Amazon

Houston, TX

Results-driven Data Scientist and Analytics Leader with 6+ years of experience delivering end-to-end data solutions across machine learning, business intelligence, and data engineering. Expert at transforming complex data into measurable outcomes across finance, marketing, healthcare, e-commerce, and enterprise technology — from predictive modeling and experimentation to scalable ETL pipelines and executive-facing BI.

View Projects Get in Touch

Years Experience

93%

ML Model Accuracy

4.0

M.S. GPA

35%

ETL Time Reduction

→ github.com/OmoyeniO → omoyeni-ogundipe.medium.com → linkedin.com/in/omoyeni-ogundipe → info@omoyeniogundipe.com → Houston, TX USA

Technical Skills

Analytics

Python · SQL · R · PySpark
NumPy · Pandas · Scikit-learn
TensorFlow · PyTorch
Statistical Inference & Modeling
A/B Testing & Experimentation
Time-Series Forecasting

Data Engineering

Snowflake · Redshift · AWS S3
Apache Airflow · dbt
Hadoop · Spark
ETL / ELT Pipeline Design
Data Warehousing & Modeling
KPI Frameworks

ML, BI & Cloud

Predictive Modeling · NLP · Deep Learning
Feature Engineering · Model Deployment
Tableau · Power BI · QuickSight
Matplotlib · Plotly
AWS (S3, Redshift, QuickSight)
Recommendation Systems

Experience

Current

YÉNI + Omoyeni.io

Houston, TX

Founder · Software Engineer & Data Professional

Founded and lead development of technology-driven consumer and digital brands spanning luxury products, software platforms, and analytics solutions.
Architect and build analytics infrastructure and dashboards tracking customer behavior, acquisition channels, and product performance.
Design and develop full-stack web platforms and backend integrations supporting e-commerce launches.
Lead end-to-end product development including requirements, UI/UX, software implementation, and digital marketing strategy.

Full-StackAnalytics InfrastructureProduct StrategyE-Commerce

Amazon

Seattle, WA

Business Intelligence Engineer

Engineered analytics and experimentation frameworks for Amazon Photos, driving a 12% increase in DAU and 14% growth in MAU.
Designed and executed A/B tests and cohort analyses, resulting in a 14% reduction in user churn.
Built distributed ML models in PySpark and Spark to forecast user re-engagement with 93% accuracy.
Automated ETL pipelines using SQL, dbt, and Airflow, reducing data processing time by 35% and improving data quality.
Developed self-serve Tableau and QuickSight dashboards adopted by cross-functional leadership.

PySparkdbtAirflowA/B TestingTableauQuickSight

Dream Chase Technologies

New York, NY

Business Intelligence Engineer

Led analytics and deep learning initiatives that improved platform DAU by 10% through targeted engagement strategies.
Designed and deployed TensorFlow-based recommendation models, increasing content satisfaction scores by 18%.
Built interactive Power BI dashboards and automated reporting pipelines.

TensorFlowDeep LearningPower BIRecommendations

Carrier

Georgia, USA

Data Scientist

Engineered neural-network fraud detection models improving anomaly detection accuracy by 20%.
Developed pricing optimization models using Python and R, contributing to a 15% revenue increase.
Applied K-Means and hierarchical clustering for customer segmentation, enabling personalized pricing strategies.
Automated pricing analytics pipelines with dbt and SQL, significantly reducing report turnaround times.

Fraud DetectionClusteringPricing Optimizationdbt

Camp House

Remote

Data Scientist

Developed collaborative-filtering recommendation systems in Python, increasing user engagement by 30%.
Designed and analyzed A/B tests on user behavior, improving conversion rates by 15%.
Delivered Power BI dashboards for sentiment analytics, boosting campaign effectiveness by 20% and driving a 25% expansion in market share.

RecommendationsA/B TestingETLSentiment Analysis

Projects

Industry Work

I–01 · Big Tech · Consumer Product

Re-Engagement Prediction Engine — 150-Variable ML Pipeline

93% Accuracy · 150+ Features · Production PySpark

Built a production-grade PySpark machine learning pipeline to predict user re-engagement for a consumer product operating at hundreds of millions of users. The model ingested over 150 predictor variables spanning behavioral signals — session frequency, feature interaction depth, content engagement patterns, notification response rates, cross-device activity, and dormancy duration — combined with user lifecycle attributes, historical retention markers, and product usage cadence indicators.

Engineering the feature set at this scale required building distributed Spark transformations across a massive data lake, including custom aggregation windows across multiple time horizons, lag features capturing behavioral decay, and robust imputation strategies for sparse signals across infrequent users. The complexity was significant: any naive join or aggregation at this data volume would collapse; every transformation had to be written with partitioning and execution plan efficiency in mind. Multiple classification algorithms were benchmarked — Gradient Boosted Trees, Random Forest, Logistic Regression, and XGBoost — with the final tuned ensemble achieving 93% accuracy with strong precision-recall balance to avoid over-notifying users with low re-engagement probability. Model outputs fed directly into the churn reduction strategy, informing targeting logic for re-engagement campaigns and contributing to a 14% reduction in user churn.

PySparkFeature EngineeringGBTXGBoostRandom ForestClassificationBig DataAWS

I–02 · Big Tech · Consumer Product

Org-Wide Product Metrics Source of Truth Dashboard

Daily · Weekly · Monthly · Analytics Engineering at Scale

Built the organization's single source of truth for product metrics — a mega-dashboard serving daily, weekly, and monthly performance tracking for the entire product org and leadership. The challenge wasn't just visualization; it was the serious analytical engineering underneath it. The data was spread across dozens of disparate tables across multiple pipelines, requiring complex multi-table joins, careful deduplication, incremental aggregation logic, and layered data transformations to produce a unified, consistent dataset.

Built the entire data layer from scratch — designing intermediate staging tables, writing optimized SQL transformations that could run efficiently at org scale without query timeouts or data freshness lag, and engineering dbt models to handle the dependency chain reliably. The final dashboard consolidated active users, engagement depth, retention curves, feature adoption, and content metrics across daily, weekly, and monthly time grains — each with consistent metric definitions so that numbers never conflicted between views. Leadership used this as the primary lens for product reviews, roadmap decisions, and organizational health assessment. It became the trusted, go-to view that replaced ad hoc reporting across the org.

Analytics EngineeringdbtSQLTableauQuickSightData ModelingETLKPI Design

I–03 · Big Tech · Consumer Product

Product Performance & Experimentation Dashboard

Self-Serve Analytics · Experiment Evaluation: Days → Hours

Designed and built a Tableau dashboard that became the single source of truth for product health and A/B experimentation across a large consumer product. Before this, evaluating feature performance required ad hoc SQL queries routed through analysts, creating multi-day delays and inconsistent metric definitions that led to conflicting interpretations of the same experiment across teams.

The dashboard centralized KPI scorecards (DAU, MAU, engagement, retention), variant-level experiment comparisons with statistical significance indicators, funnel analysis across key user journeys, and cohort views for post-launch behavioral tracking. On the backend, built optimized SQL pipelines aggregating data from multiple sources into clean, analysis-ready tables — and implemented standardized experiment metric logic so that teams were always comparing apples to apples. Product managers could evaluate launches in near real-time and decide to scale, iterate, or roll back within minutes instead of days. The experimentation views became a fixture in weekly product reviews, and self-serve adoption across the org increased significantly.

TableauSQLA/B TestingFunnel AnalysisCohort AnalysisKPI DesignSelf-Serve BI

I–04 · Big Tech · Consumer Product

First-Touch & Last-Touch Attribution Model

10M+ Users · Multi-Channel Attribution · SQL

Built a first-touch and last-touch attribution framework in SQL to map how users across a digital product with over 10 million users discovered and adopted key features. First-touch attribution identified the original acquisition channel or entry point that introduced a user to the product — critical for understanding which awareness channels drove the highest-quality, highest-retention users. Last-touch captured the final interaction before a meaningful conversion event, revealing which touchpoints were actually closing adoption decisions.

The model was built on top of a full user event stream, requiring careful session reconstruction, deduplication logic, and handling of multi-device journeys where a single user might interact across web, mobile, and embedded surfaces. The output enabled marketing and product teams to accurately assess channel efficiency, reallocate acquisition spend toward highest-performing sources, and identify friction points in the adoption funnel where users were dropping off before converting — directly informing campaign strategy and onboarding redesigns.

SQLAttribution ModelingFunnel AnalysisEvent StreamsUser AcquisitionMulti-Channel

I–05 · Global Manufacturing · Enterprise

Equitable Pricing Optimization Model — 94% Accuracy

94% Accuracy · 15% Revenue Increase · Tree & Ensemble Models

Built a robust, production-grade pricing optimization model for a global enterprise — one that was not only technically rigorous but also designed around legally acceptable, equitable pricing principles. The model was built on behavioral data that met strict fairness and compliance requirements, augmented with external market data, competitor pricing signals, inflation indices, and macroeconomic indicators to create a multi-dimensional view of pricing conditions. The goal was dual: maximize company profitability while simultaneously preserving customer satisfaction and long-term retention.

Experimented extensively with tree-based and ensemble approaches — Decision Trees, Random Forest, Gradient Boosted Trees, XGBoost, and LightGBM — evaluating each on both predictive accuracy and interpretability to ensure the pricing logic could be explained and audited. The final model achieved 94% accuracy in predicting the price at which individual customers were willing to purchase, enabling truly personalized pricing at scale. The model was then operationalized by scaling it and ingesting it directly into a Power BI environment, building a dashboard that allowed commercial and finance teams to run pricing scenarios, simulate revenue outcomes, and deploy pricing decisions in real time. Following A/B testing of the new pricing strategy against a control group, results showed a 15% increase in revenue, a significant increase in total orders and order rate, and measurable improvement in customer satisfaction scores related to perceived pricing fairness.

XGBoostLightGBMRandom ForestGBTPricing OptimizationPower BIA/B TestingMarket Datadbt

I–06 · AI Interior Design · Founder Project

AI Interior Design Generation Engine

Image-In · Design-Out · End-to-End Generative AI Pipeline

Built the core AI engine powering Omoyeni.io — an end-to-end generative AI system that takes a photo of a user's physical space as input and produces a fully redesigned or newly designed version of that space based on their style preferences and interior design specifications. The pipeline begins with image understanding: a computer vision model analyzes the uploaded photo to detect spatial structure, existing furniture placement, lighting conditions, room dimensions, and architectural elements — forming a semantic map of the space.

This spatial understanding is combined with the user's stated design preferences — style (minimalist, maximalist, Scandinavian, industrial, etc.), color palette, material preferences, and budget tier — to condition a fine-tuned diffusion model that generates photorealistic redesign outputs. The generative backbone leverages Stable Diffusion with ControlNet conditioning to preserve the room's structural geometry while replacing and redesigning surfaces, furniture, lighting, and décor. Style transfer and inpainting techniques allow targeted redesign of specific zones without disrupting the overall spatial layout. The output image is matched against a product catalog using visual similarity search (CLIP embeddings + vector database) to surface shoppable furniture and décor items that match the generated design, which users can add directly to their cart. A Tasker service integration completes the loop — users can book a professional to physically implement the design, making the platform a true end-to-end interior design experience from inspiration to installation.

Stable DiffusionControlNetComputer VisionCLIPVector SearchInpaintingStyle TransferGenerative AIPython

I–07 · Tech Platform · Consumer

Customer Pulse — NLP Sentiment & Review Analytics

Real-Time Power BI · NLP Sentiment · Daily Refresh

Developed a Power BI application for Customer Pulse Analytics that used NLP to automatically classify customer reviews and feedback as positive or negative in real time. The dashboard included sentiment analytics, chatbot interaction analytics, and review trend tracking — all refreshing daily via an automated pipeline so the team could monitor customer perception continuously rather than reactively.

The system enabled the team to see analyses of customer reviews daily, implement data-driven decisions in real time, and respond to emerging sentiment shifts before they escalated. Proactive monitoring of customer perception ensured product and support decisions were grounded in live feedback rather than lagging reports — directly contributing to a 10% improvement in platform daily active users.

NLPSentiment AnalysisPower BITensorFlowReal-Time PipelineText Classification

I–08 · Digital Platform · Consumer

Collaborative Filtering Recommendation System

30% Engagement Increase · Cold-Start Solved · A/B Tested

Built a collaborative filtering recommendation system from scratch in Python for a platform with a growing but not yet large customer base — making standard collaborative filtering challenging due to data sparsity. Solved the cold-start problem by constructing a user-item interaction matrix representing all historical user-booking relationships, then computing Pearson correlation coefficients to identify similarity between users based on behavioral patterns. Neighbourhood selection identified subsets of the most similar users, which powered personalized recommendations for both existing and new users.

After deployment, ran a rigorous A/B test using simple random sampling and stratified cookies to ensure clean group separation — control group saw the previous system, test group experienced the new recommendation engine. Measured performance across engagement metrics including time on site, click-through rate, and purchase rate. Resulted in a 30% increase in user engagement and a 15% improvement in conversion rates.

Collaborative FilteringPythonPearson CorrelationA/B TestingUser-Item MatrixCold Start

Open Source & Academic

↗︎

OS–01 · Open Source

Stock Price Prediction Using ML Algorithms

Udacity Data Science Nanodegree Capstone

Applied multiple ML algorithms to predict stock price movements. Complete pipeline from data acquisition through feature engineering to model evaluation and comparison.

PythonScikit-learnTime SeriesRegression

↗︎

OS–02 · Open Source

Churn Prediction — PySpark at Scale

93% Accuracy · Distributed ML Pipeline

Predicted customer churn for a music streaming platform (Sparkify) using distributed PySpark infrastructure. Scalable feature engineering handles massive datasets for production-grade retention insights.

PySparkBig DataFeature EngineeringClassification

↗︎

OS–03 · Open Source

NLP Disaster Response Classifier

Real-Time Flask Inference API

Multi-class text classifier using NLTK and Scikit-learn that categorizes emergency messages and routes them to the correct response team in real time.

NLPNLTKFlaskMulti-class

↗︎

OS–04 · Open Source

Airbnb Price Prediction — Seattle

10% Reduction in Mean Absolute Error

Random Forest and XGBoost regression models for Seattle Airbnb pricing. Deep EDA uncovered seasonal trends and neighborhood dynamics; feature engineering drove a 10% MAE improvement.

XGBoostRandom ForestEDARegression

↗︎

OS–05 · Open Source

Insurance Fraud Detection

18% Improvement in Detection Precision

Deep learning classification model for insurance fraud. Achieved an 18% improvement in detection precision through architecture tuning and advanced feature construction.

Deep LearningTensorFlowClassificationAnomaly Detection

↗︎

OS–06 · Open Source

Rising Interest Rates on Fixed Income Portfolios

Statistical Analysis & Research

In-depth statistical analysis examining how rising interest rates impact fixed income instruments and portfolios. Includes quantitative modeling, scenario analysis, and research presentation.

RStatistical AnalysisFinancial Modeling

Research

ResearchGate · Sep 2024

Integration of Machine Learning Algorithms for Real-Time Risk Assessment in Financial Trading Systems

Investigates the integration of ML algorithms — Random Forests, XGBoost, and Deep Neural Networks — for real-time risk assessment in high-frequency financial trading. Improves identification and mitigation of risks associated with financial market fluctuations by analyzing historical price trends, trading volume anomalies, and market volatility indices.

↗ Read on ResearchGate

2024

Int. Journal of Business & Economics Research · Vol. 12, No. 2 · 2023

Application of Statistical Inference Using Entropy to Characterize the Transfer of Data Across Financial Systems

Proposes a novel approach to quantify information transfer between financial systems using Transfer Entropy and Kullback-Leibler divergence measures, enabling deeper understanding of inter-system data flow dynamics. Cited in subsequent research on statistical inference applications in both business and medical contexts.

↗ Read on ResearchGate

2023

ResearchGate · Sep 2024

Leveraging AI for Financial Security in Emerging Markets

Examines how ML, NLP, and predictive analytics address financial security challenges in emerging markets — including vulnerability to fraud, regulatory compliance, and financial instability — through comprehensive case-study review, highlighting benefits, limitations, and future directions for AI-driven financial security.

↗ Read on ResearchGate

2024

Writing

I write on Medium about data science, machine learning, analytics engineering, and the intersection of data with real-world business decisions — practical perspectives from 6+ years in the field.

→ Read on Medium

Education

M.S. Computer Science & Quantitative Methods

Austin Peay State University

Mathematical Finance

GPA: 4.0 / 4.0

Supervised & unsupervised learning, software engineering, deep learning, data engineering, A/B testing, experimental design, and recommendation systems. Built end-to-end ETL, NLP, and ML pipelines on real-world datasets.

B.S. Accounting

Ajayi Crowther University

Undergraduate

GPA: 4.58 / 5.0

Strong quantitative and analytical foundation in financial systems, accounting principles, and business economics — directly informing work at the intersection of data science and financial modeling.

Let's
Connect

Open to data science collaborations, consulting engagements, research partnerships, and conversations about how analytics can drive meaningful impact.

Email info@omoyeniogundipe.com LinkedIn omoyeni-ogundipe GitHub github.com/OmoyeniO Medium omoyeni-ogundipe.medium.com

Omoyeni
Ogundipe

Technical Skills

Experience

Projects

Research

Writing

Education

Let's
Connect

Building
Ventures
that Matter

Ventures

OmoyeniOgundipe

Technical Skills

Experience

Projects

Research

Writing

Education

Let'sConnect

BuildingVenturesthat Matter

Ventures

Omoyeni
Ogundipe

Let's
Connect

Building
Ventures
that Matter