Solution
data-engineering
15 articles on data-engineering.
Data Lake Cost Optimization: 3 Levers
Data lake cost optimization comes down to three levers: partition pruning, file compaction, lifecycle tiering. How to tune each one in production.
Insurance Claims Classification with LLMs
Triaging insurance claims across 61 labels needs more than a model — it needs frozen eval sets, per-label thresholds, and a lakehouse built for documents.
Multi-3PL Logistics Data Foundation
Running shipments across multiple 3PLs and dozens of carriers? Real network visibility starts at the data layer — the multi-3PL foundation pattern.
Manufacturing Data Foundation: A Blueprint
Plant floor, ERP, quality, and maintenance data live in four silos that never join. Here's the manufacturing data foundation architecture that fixes it.
Watermark Bugs in Fabric Incremental Loads
A watermark incremental load in Microsoft Fabric silently duplicated 3 months of Gold-layer data. The fix: idempotent MERGE plus a row-count assertion.
Beat NetSuite API Limits with SuiteQL
Our NetSuite pipeline hit API rate limits and ran 28 hours per ingestion. Moving from the REST record API to SuiteQL cut it to under 6. Here's exactly how.
Retail Personalization Beyond the Carousel
Most retail personalization stops at the recommendation carousel. The real lift lives in the inventory join and identity layer underneath.
Medallion Architecture: 5 Failure Modes
Most bronze/silver/gold lakehouse builds repeat the same five mistakes. A practitioner's breakdown of medallion architecture failure modes — and the fixes.
Iceberg vs Delta vs Hudi in 2026
After years of open table format wars, the 2026 picture is clear: Iceberg has won, but the catalog choice is now where vendor lock-in lives.
Supply Chain Visibility Beyond Dashboards
Most supply chain visibility tools paint a dashboard over broken data. Real visibility lives in the WMS-TMS-carrier integration layer underneath.
Hybrid Row-Level Security: AWS + Power BI
How we wired Azure AD identities to AWS Lake Formation to Power BI - with row-level security that keeps field, regional, and exec reports distinct.
Why Predictive Maintenance Pilots Stall
Most enterprise predictive maintenance pilots stall before payback. The fix isn't more sensors — it's the data foundation underneath. Here's the pattern.
Fabric OneLake Shortcuts vs ADLS Mounts
When OneLake shortcuts beat ADLS Gen2 mounts in Microsoft Fabric, when they silently break, and the decision matrix we use on every migration.
Synapse to Fabric: 4 Silent Breakages
Four Synapse-to-Fabric migration gotchas that pass code review but break production: identity columns, distribution DDL, OPENROWSET, F-SKU throttling.
Why Your AI Pilot Stalls at 80%
Most enterprise AI pilots hit 80% accuracy in a demo and never reach production. Here's the data-stage failure pattern behind it — and a concrete path to ship.