AI Assistant for Data Issue Troubleshooting
Mapped historical issue occurrences spanning SAP BW/BPC environments directly to logical root causes. This exploratory project explores how AI logic can improve analyst efficiency and institutional knowledge reuse within complex ETL layers.
A Problem I Lived Before I Designed For It
When I worked as an SAP BPC/BW analyst, the hardest part of the job wasn't the technical complexity, it was the knowledge fragmentation. Every issue had a solution somewhere: in someone's email, in a call someone remembered, in a ticket that was closed and forgotten. But when an issue resurfaced at 2am before a reporting deadline and the person who solved it six months ago was on leave, we were starting from scratch.
This project is my attempt to model and prototype a system that addresses that gap. Not a full product, not a deployed assistant — but a structured concept design and working prototype that demonstrates how a rule-based AI reasoning layer could connect issue symptoms to structured knowledge and guide analysts through troubleshooting faster.
Knowledge Locked in People
Solutions existed, but only in the heads of specific team members. Unavailability meant starting from zero.
Recurring Issues, No Memory
The same root cause reappeared months later. Without structured retrieval, hours were spent re-diagnosing.
High-Pressure Blind Spots
During audits and budget cycles, low-frequency but high-complexity issues hit hardest, and were least documented.
Why the Data Pipeline Makes This Hard
Enterprise reporting systems have a layered architecture. By the time a problem appears on a report, it has usually passed through four or five layers — and the root cause is long upstream.
Source Systems
Transactions generated
Ingestion
Data loaded to warehouse
Transformation
Mapping, calculations
Aggregation
Roll-up logic applied
Reporting
Issue noticed here
Issues are detected at the right, but originate anywhere to the left.
Core challenges
- 1Knowledge is scattered and not retrievable under pressure
- 2Troubleshooting is repetitive — the same root causes reappear
- 3Root cause is always upstream from where the symptom is visible
- 4No structured way to map symptoms to likely causes
The Design Insight
The solution isn't to document everything. It's to build a system that maps the pattern of an issue (its symptoms) to the most likely upstream causes, and surfaces the right questions to ask at each layer.
From Symptom to Diagnosis
How the assistant is designed to reason, layer by layer.
Step 1: Input — Natural Language Description
The Analyst Describes What They See
The entry point is intentionally simple. Analysts don't need to know the root cause to use the system — they describe the symptom as they observe it. The assistant handles classification.
What I expected: I assumed analysts would need to formulate a structured query to get useful output.
What I found: The most natural input is just how the analyst would describe the problem to a colleague — which is already enough signal to classify and route.
Why it matters: This lowers the barrier to adoption significantly. The system meets analysts where they are, not where a database schema expects them to be.
Step 2: Classification — Pattern Matching Engine
Mapping Symptoms to Issue Categories
The rule engine scans the input for known symptom keywords and maps them to structured issue categories. Each category carries a pre-built set of likely causes and troubleshooting steps derived from past cases.
ISSUE_PATTERNS = [
{
"id": "balance_mismatch",
"keywords": ["not balanced", "mismatch", "balance sheet"],
"causes": [
"Incomplete data ingestion",
"Transformation inconsistency",
"Aggregation mismatch",
],
"steps": [
"Compare source vs. loaded totals",
"Check ingestion job logs for failures",
"Validate transformation mapping rules",
],
},
{
"id": "missing_data",
"keywords": ["zero", "missing", "not showing", "empty"],
"causes": [
"Data load failure",
"Incorrect filtering or selection",
"Mapping issue in source assignment",
],
"steps": [
"Verify data availability in source system",
"Check ingestion process for the period",
"Validate report parameters and member filters",
],
},
]
def classify_issue(description: str) -> dict | None:
# Lowercase input for case-insensitive matching
desc_lower = description.lower()
for pattern in ISSUE_PATTERNS:
if any(kw in desc_lower for kw in pattern["keywords"]):
return pattern
return None # No pattern matched
Why rule-based first
A rule engine is transparent, auditable, and fast. For structured enterprise environments where issues follow known patterns, it provides immediate value without model complexity.
Where LLMs extend it
Language models would handle inputs that don't match known keywords — paraphrases, novel symptoms, or multilingual descriptions — without requiring the pattern library to grow indefinitely.
What I expected: I started with the assumption that LLMs would be the core of the system.
What I found: A well-structured rule engine handles the 80% case accurately, with far less overhead and better explainability. LLMs become an enhancement layer, not the foundation.
Why it matters: In enterprise environments, explainability often matters as much as capability. Being able to show exactly why the system suggested a particular root cause builds trust with analysts.
Step 3: Retrieval — Second Use Case Example
Missing Data Pattern
The same architecture handles a different symptom class. Input is parsed against a separate pattern, and the output routes accordingly with a different set of steps.
What I expected: Each issue type would require a completely separate response structure.
What I found: The same output format works across all issue types — only the content differs. This means the interface stays consistent while the knowledge layer scales independently.
Why it matters: Decoupling the presentation layer from the knowledge layer makes the system much easier to maintain and extend as new issue types are added.
Step 4: Architecture — System Design
How the Layers Connect
The system is intentionally modular. Each layer has a single responsibility, making it possible to swap, upgrade, or extend any component without affecting the others.
Input Interface
Accepts free-text issue descriptions from analysts. No structured form required.
Classification Engine
Rule-based pattern matcher. Scans input for known symptom keywords and maps to issue categories.
Knowledge Base
Structured store of issue patterns, each with associated causes and resolution steps.
Response Engine
Formats the retrieved knowledge into structured output — causes ranked by likelihood, steps ordered by layer.
LLM Enhancement Layer
Handles inputs that don't match known patterns. Falls through from the rule engine when no keyword match is found.
What This Changes in Practice
Four concrete improvements over the current state.
Reduced Troubleshooting Time
Instead of retracing steps from memory, analysts get a structured starting point within seconds of describing the issue.
More Consistent Problem-Solving
The same issue described by two different analysts routes to the same structured guidance, reducing variance in approach.
Reduced Individual Dependency
Knowledge is captured in the system, not in a person. Team availability no longer determines response quality.
Structured Institutional Memory
Each resolved case is a potential pattern addition. The system grows as real issues are handled, not as documentation campaigns are organized.
Try the Assistant
Pick one of the sample issues below or describe your own. The assistant will classify it and return structured root causes, troubleshooting steps, and next data to check — exactly as the system is designed to work.
Try It: Diagnostic Assistant
Describe an issue — get structured root causes and steps
Pick a sample issue or describe one. The assistant will classify it and return structured guidance.
Rule-based demo only. All examples are synthetic and generalized — no proprietary or confidential data is used.
Disclaimer
This project is inspired by general patterns observed in enterprise data environments. It does not include any proprietary, confidential, or organization-specific data, documentation, or internal systems. All issue examples and patterns are synthetic and generalized.
The hardest part wasn't building the classifier. It was admitting that most of the value comes from structuring the knowledge, not from the AI.
I came into this project thinking the interesting design problem was the language model integration. What I found is that the real work is upstream: deciding how to represent an issue, what counts as a distinct pattern, and how to turn a raw experience into something retrievable. The rule engine only works because the knowledge base was designed well. That's the lesson I'd carry into a production version of this system.
Explore More Projects
Continue exploring selected data, analytics, and machine learning projects.

Chili Quality Classification
Built a Two-Stage XGBoost classification pipeline to automate agricultural quality control. Explored thorough Exploratory Data Analysis (EDA) and compared decision trees, RF, SVM, and MLP models before isolating XGBoost due to a superior ~83% accuracy and ~82% macro F1 score. Emphasized end-to-end data science workflows over simple model APIs.

Retail Inventory Optimization & Demand Analysis
A SQL-based exploration into how inventory, demand forecasting, and sales interact, with a focus on identifying inefficiencies and turning data into actionable business insights.

Telkom Indonesia – Enterprise Dashboard Visualization System
Developed enterprise-grade operational dashboards for the Regional Enterprise and Government Service (REGS) division during an internship at Telkom Indonesia. Replaced manual spreadsheet workflows with a Looker Studio LOP dashboard and an interactive web-based LOB dashboard integrated with Google Sheets API, Firebase, and Google Cloud — significantly improving data visibility, collaboration, and monitoring efficiency.

