Retail Inventory Optimization & Demand Analysis

SQLPostgreSQLData AnalysisInventoryForecasting

A SQL-based exploration into how inventory, demand forecasting, and sales interact, with a focus on identifying inefficiencies and turning data into actionable business insights.

View Source

Dataset Overview

Going Beyond the Query

I didn't start this project just to practice SQL. After spending time around enterprise data systems, I kept noticing the same pattern: the numbers look fine on the surface, but something doesn't quite add up when you start connecting them to actual business decisions.

I picked retail because it's a domain where the data is close to the money. Every inventory decision affects cash flow, waste, and fulfillment. I used a high-fidelity Kaggle dataset tracking daily operations across 73,000+ records. To keep this case study performant in the browser, I ran a Python Pandas pipeline to pre-aggregate the raw data before feeding it into these charts.

retail_store_inventory.csv — Sample

Pay attention to the ratio between Inventory Level, Units Sold, and Demand Forecast.

73.1k Rows

11 Fields

Date	Store ID	Product ID	Category	Region	Inventory Level	Units Sold	Units Ordered	Demand Forecast	Price	Discount	Weather Condition	Holiday/Promotion	Competitor Pricing	Seasonality
2022-01-01	S001	P0001	Groceries	North	231	127	55	135.47	33.5	20	Rainy	0	29.69	Autumn
2022-01-01	S001	P0002	Toys	South	204	150	66	144.04	63.01	20	Sunny	0	66.16	Autumn
2022-01-01	S001	P0003	Toys	West	102	65	51	74.02	27.99	10	Sunny	1	31.32	Summer
2022-01-01	S001	P0004	Toys	North	469	61	164	62.18	32.72	10	Cloudy	1	34.74	Autumn
2022-01-01	S001	P0005	Electronics	East	166	14	135	9.26	73.64	0	Sunny	0	68.95	Summer
2022-01-01	S001	P0006	Groceries	South	138	128	102	139.82	76.83	10	Sunny	1	79.35	Winter
2022-01-01	S001	P0007	Furniture	East	359	97	167	108.92	34.16	10	Rainy	1	36.55	Winter
2022-01-01	S001	P0008	Clothing	North	380	312	54	329.73	97.99	5	Cloudy	0	100.09	Spring
2022-01-01	S001	P0009	Electronics	West	183	175	135	174.15	20.74	10	Cloudy	0	17.66	Autumn
2022-01-01	S001	P0010	Toys	South	108	28	196	24.47	59.99	0	Rainy	1	61.21	Winter
2022-01-01	S001	P0011	Furniture	South	258	150	153	152.74	58.53	10	Sunny	1	61.42	Spring
2022-01-01	S001	P0012	Clothing	West	66	24	70	26.75	58.25	20	Snowy	0	62.21	Spring
2022-01-01	S001	P0013	Toys	South	96	42	85	41.46	43.6	0	Cloudy	0	46.31	Spring
2022-01-01	S001	P0014	Clothing	West	193	12	187	6.8	78.11	0	Sunny	0	80.06	Spring
2022-01-01	S001	P0015	Clothing	North	379	369	154	363.46	92.99	15	Snowy	0	95.8	Winter

Problem Framing

The Business Problem

When I started exploring the data, one thing stood out right away. Inventory levels were consistently sitting well above actual sales volumes. That gap was too wide and too stable to be coincidence.

Questions I wanted to answer

1Was the business ordering too much to begin with?
2Were the demand forecasts consistently too high?
3Was this a planning model problem, not just an operations one?

Why Groceries Are Different

Overstocking electronics means capital sits idle. Overstocking groceries means product spoils. The stakes here are higher because the loss is total, not just delayed.

Data Narrative

The Data Story

From first exploration to a clear diagnosis of what was actually wrong.

Step 1: Exploration

Understanding the Landscape

Before looking for anything specific, I wanted to see how sales volume distributed across categories and regions. I was checking whether the problem had an obvious concentration point.

Sales Distribution by Region and Category

What I expected: I thought I'd find obvious imbalances, maybe one region lagging behind or one category dominating the rest.

What I found: Sales volume was almost perfectly uniform across all regions and categories. No outlier stood out.

Why it matters: That uniformity told me the issue wasn't localized. Whatever was driving the inefficiency was happening everywhere at once, which pointed to the planning model, not the operations.

Step 2: Pattern Finding

Tracking the Forecast Over Time

With no obvious concentration, I shifted my focus to time. I wanted to see how the demand forecast tracked against actual sold units month by month. A healthy model should sometimes overshoot, sometimes undershoot.

Monthly Forecast vs. Actual Sold

What I expected: A reasonable forecast should cross the actual line in both directions over time.

What I found: The forecast stayed above actual sold almost every single month. It never corrected downward in any meaningful way.

Why it matters: When a model is consistently biased in one direction, it stops being a forecast and starts being a standing instruction to over-order. Purchasing teams following this model had no reason to ever pull back.

Step 3: Deeper Diagnosis

Where the Error Was Worst

Knowing the bias was systemic, I wanted to find out which category-season combinations had the highest forecast error. Not all overestimates carry the same risk.

Average Forecast Error by Category and Season (%)

What I expected: I expected Electronics or Clothing to be the hardest to forecast, since trends in those categories shift faster.

What I found: Groceries had one of the highest and most consistent error rates, around 16%, across all seasons. It didn't spike; it was just reliably wrong.

Why it matters: A 16% overestimate on electronics means you bought too many units. A 16% overestimate on produce means it rots. The financial impact is in a different category altogether.

Step 4: Conclusion

Seeing the Full Picture

For the final check, I compared resting inventory levels directly against actual sales volume by region. I wanted to see the size of the buffer the business was maintaining.

Resting Inventory vs. Actual Sales by Region

What I expected: Inventory should sit reasonably close to sales volume, with just enough buffer for delivery delays and demand spikes.

What I found: The inventory buffer was significantly larger than needed in every single region. The gap wasn't slight; it was structural.

Why it matters: When I put this next to the forecast bias finding, the picture became clear. Because the forecasts were always high, procurement was always told to order more. The business was compensating for bad predictions with excess stock instead of fixing the predictions.

Execution

Structured SQL Analysis

These are the queries I used to work through the problem. Each one was written with a specific question in mind.

Part A

Descriptive Queries

query_01__sales_by_category.sql

SELECT
    "Category",
    SUM("Units Sold")     AS total_units_sold,
    SUM("Units Ordered")  AS total_units_ordered
FROM  retail_store_inventory
GROUP BY 1
ORDER BY total_units_sold DESC;

Output preview

Category	total_units_sold	total_units_ordered
Furniture	2,025,017	1,617,699
Groceries	2,000,482	1,606,188
Clothing	1,999,166	1,614,885
... (2 more rows)

Why I wrote this

I wanted a quick baseline to check whether any category was dramatically different from the others before digging deeper.

What I found

All five categories came in close to each other. No single one was carrying the business or dragging it down. That ruled out a category-level root cause early.

query_02__seasonal_rankings.sql

SELECT
    "Seasonality",
    "Category",
    SUM("Units Sold") AS total_units_sold,
    RANK() OVER (
        PARTITION BY "Seasonality"
        ORDER BY SUM("Units Sold") DESC
    ) AS sales_rank
FROM  retail_store_inventory
GROUP BY 1, 2;

Output preview

Seasonality	Category	total_units_sold	sales_rank
Winter	Furniture	508,730	1
Winter	Groceries	505,027	2
Winter	Clothing	504,972	3

Why I wrote this

I wanted to see whether seasonal shifts changed the category rankings, since inventory planning often follows those patterns.

What I found

Furniture topped Winter sales, but Groceries and Clothing were right behind — all tightly clustered. That uniformity confirmed this wasn't a category-level anomaly; it was a planning-level one.

Part B

Diagnostic Queries

Once I had the baseline, I wanted to quantify the gap. These queries were designed to measure the actual distance between what was forecasted, what was stocked, and what was actually sold.

query_03__monthly_inventory_pressure.sql

SELECT
    date_trunc('month', "Date"::date) AS month,
    SUM("Inventory Level")  AS total_inventory,
    SUM("Units Sold")      AS total_sold,
    SUM("Demand Forecast") AS total_forecast,
    SUM("Inventory Level") - SUM("Units Sold") AS inventory_gap,
    SUM("Demand Forecast") - SUM("Units Sold") AS forecast_error,
    ROUND(
        SUM("Inventory Level")::numeric
         / NULLIF(SUM("Units Sold"), 0),
        2
    ) AS inv_to_sales_ratio
FROM  retail_store_inventory
GROUP BY 1
ORDER BY 1;

Output preview

month	total_inv	total_sold	inv_gap	fc_error	ratio
2022-01	843,236	419,938	423,298	+15,652	2.01
2022-02	772,235	391,052	381,183	+14,434	1.97
2022-03	851,921	426,073	425,848	+15,992	2.00

Why I wrote this

I wanted to check whether the inventory pressure was a consistent monthly pattern or just a seasonal peak. The ratio column was the key metric.

What I found

The ratio hovered close to 2.0 every single month — meaning for every unit sold, roughly two units were sitting in stock. That wasn't a seasonal spike; that was structural.

query_04__category_season_risk_view.sql

SELECT
    "Category",
    "Seasonality",
    AVG("Inventory Level")                  AS avg_inventory,
    AVG("Units Sold")                      AS avg_sold,
    AVG("Inventory Level" - "Units Sold") AS avg_gap,
    ROUND(
        AVG("Inventory Level")::numeric
         / NULLIF(AVG("Units Sold"), 0),
        2
    ) AS inv_to_sales_ratio
FROM  retail_store_inventory
GROUP BY 1, 2
ORDER BY inv_to_sales_ratio DESC;

Output preview

Category	Seasonality	avg_inv	avg_sold	avg_gap	ratio
Groceries	Spring	275.2	134.4	140.8	2.05
Groceries	Summer	274.8	134.6	140.2	2.04
Electronics	Spring	272.4	134.2	138.2	2.03

Why I wrote this

I needed to find which specific category-season pairs had the worst ratios, to help prioritize where to focus improvement efforts.

What I found

Groceries came out on top in Spring and Summer, but the margins were tight across all categories. The ratios were close to 2.0 everywhere — this wasn't a category outlier, it was a universal planning problem with Groceries slightly worse.

Synthesized Findings

What the Data Pointed To

Four patterns that showed up consistently across every angle of analysis.

Systemic, Not Isolated

The inefficiency showed up in every region and every month. There was no single warehouse to blame. The pattern was everywhere.

Forecast Always Too High

The demand forecast almost never came in below actual sales. It was a one-directional bias, which meant purchasing teams were consistently told to order more than needed.

Seasonal Shifts Overlooked

Category rankings shifted with the seasons, but the inventory buffers didn't adjust accordingly. The planning model didn't respond to seasonality the way it should have.

Groceries Carry Real Waste Risk

Overstocking perishables isn't just inefficient; it creates direct write-off losses. Groceries had both a high ratio and a high error rate, making them the most urgent segment to address.

Recommendations

What I Would Do Next

Based on what the data showed, here are the changes I'd prioritize.

Recalibrate the Forecast Model

The model has a consistent upward bias. The first step is adding a historical correction factor that pulls estimates back toward realized demand before they reach purchasing teams.

Differentiate Safety Stock by Category

Not every product needs the same buffer. Staple items with predictable demand can run leaner. Groceries, given the spoilage risk, need tighter weekly review rather than a bigger cushion.

Treat Perishables as a Priority Fix

Groceries showed up as the highest-ratio, highest-risk segment. Any improvement plan should start there, since the cost of getting it wrong is a write-off, not just a delay.

Track the Gap Weekly, Not Monthly

The forecast error was visible in monthly data, but catching it weekly would give teams the ability to course-correct before the numbers compound into a problem.

Reflection

The hardest shift wasn't learning SQL. It was learning when to stop describing and start diagnosing.

When I started this project, I assumed the interesting part would be writing complex queries. It turned out the more valuable skill was knowing what question each query was actually trying to answer. Running the seasonal ranking felt straightforward until I realized the rankings alone didn't explain the risk. I had to layer forecast error on top before the picture made sense. That's the part that surprised me most: useful analysis isn't just technically correct, it has to be pointed at the right problem.

Explore More Projects

Continue exploring selected data, analytics, and machine learning projects.

Data Science (Thesis)

Chili Quality Classification

Built a Two-Stage XGBoost classification pipeline to automate agricultural quality control. Explored thorough Exploratory Data Analysis (EDA) and compared decision trees, RF, SVM, and MLP models before isolating XGBoost due to a superior ~83% accuracy and ~82% macro F1 score. Emphasized end-to-end data science workflows over simple model APIs.

PythonXGBoostScikit-learnPandas+1 more

View Project

Internship / Data Analytics

Telkom Indonesia – Enterprise Dashboard Visualization System

Developed enterprise-grade operational dashboards for the Regional Enterprise and Government Service (REGS) division during an internship at Telkom Indonesia. Replaced manual spreadsheet workflows with a Looker Studio LOP dashboard and an interactive web-based LOB dashboard integrated with Google Sheets API, Firebase, and Google Cloud — significantly improving data visibility, collaboration, and monitoring efficiency.

Looker StudioGoogle Sheets APIFirebaseNode.js+4 more

View Project

Exploratory / Concept

AI Assistant for Data Issue Troubleshooting

Mapped historical issue occurrences spanning SAP BW/BPC environments directly to logical root causes. This exploratory project explores how AI logic can improve analyst efficiency and institutional knowledge reuse within complex ETL layers.

ConceptAI LogicSAP BW/BPCRoot Cause Analysis

View Project