Algoscale

Synapse to Fabric: 4 Silent Breakages

Four Synapse-to-Fabric migration gotchas that pass code review but break production: identity columns, distribution DDL, OPENROWSET, F-SKU throttling.

Neeraj Agarwal

Neeraj Agarwal

Founder & CEO, Algoscale

Most of what’s written about Synapse → Fabric migration treats it like a pipeline lift-and-shift. Microsoft’s own migration assistant categorizes your artifacts as Ready / Needs Review / Unsupported, the assessment-first flow is genuinely good, and it gets you 70% of the way there.

The other 30% is where the pain lives. After a handful of these migrations for enterprises on Microsoft Fabric consulting engagements, we’ve collected a running list of things that pass the automated assessment, pass code review, deploy cleanly into Fabric — and then silently corrupt data, blow up cost, or halve query performance in prod.

This post is the four that show up most. None of them are in the assessment tool because each one looks fine on paper. The DDL parses. The pipeline runs. The bill arrives.

1. IDENTITY columns silently drop off your schema

In Synapse Dedicated SQL Pool, IDENTITY(1,1) on a surrogate key is a cheap reflex — you add it to the dimension table, the MPP engine hands out monotonic-within-partition IDs, and your ETL never thinks about it again.

Fabric Warehouse doesn’t support identity columns. At all. And the failure mode is subtle.

Risa Nia’s breakdown of why identity columns work in Synapse but not Fabric has the architectural reason — Fabric’s delta-native engine leans toward schema-on-read patterns where server-assigned sequences don’t compose cleanly with concurrent writers across OneLake. But the operational reality is the thing migration teams miss:

  • Automated schema converters strip the IDENTITY property silently. Your CREATE TABLE ships with the column retyped as a plain BIGINT, nullable. The migration tool marks the table Ready.
  • Every INSERT that depended on the identity column now writes NULL into your surrogate key. Downstream joins start returning duplicates or missing rows. Dashboards show ±10% variance and nobody can reproduce the bug.
  • Foreign-key-style joins against unmaterialized FKs (which Dedicated Pool tolerated) still work — they just return wrong answers.

The failure is silent because the schema compiles, the pipeline runs green, and the data looks populated. You only notice when a finance report disagrees with itself.

The fix — and it’s a real refactor, not a one-liner:

  1. For dimensions where the surrogate key is purely technical, switch to hash keys computed from the natural key (HASHBYTES('SHA2_256', CONCAT_WS('|', col1, col2)) cast to a BIGINT-sized prefix). Deterministic, idempotent, re-runnable.
  2. For dimensions where you need monotonic IDs, use sequences (CREATE SEQUENCE is supported in Fabric Warehouse) combined with NEXT VALUE FOR in the insert.
  3. For Type 2 SCDs, generate the key in the stage as ROW_NUMBER() OVER (ORDER BY business_key) + (SELECT MAX(sk) FROM dim). Ugly but works.

If you have 50+ dimensions using identity columns, budget a sprint. This is the single refactor that most teams underestimate.

2. Distribution and indexing DDL gets silently ignored

Synapse Dedicated Pool is an MPP engine. WITH (DISTRIBUTION = HASH(customer_id), CLUSTERED COLUMNSTORE INDEX) was load-bearing — it told the engine how to co-locate joins and how to store rows for columnar scans. A poorly-distributed fact table could take 10x as long to query as a well-distributed one.

Fabric Warehouse is not an MPP engine. It’s a delta-native engine over OneLake, and it makes its own storage and layout decisions. The DDL you wrote for Synapse is still syntactically legal in many cases — but it’s a no-op. The Microsoft mapping guide is honest about this: there is no direct equivalent of a Dedicated Pool in Fabric; the Fabric Warehouse compute model is closer to Synapse Serverless than Dedicated.

What actually happens when you migrate your DDL:

Synapse clauseBehavior in Fabric Warehouse
DISTRIBUTION = HASH(col)Accepted, ignored. Engine picks its own layout.
DISTRIBUTION = REPLICATEAccepted, ignored. No replicated small-table optimization.
DISTRIBUTION = ROUND_ROBINAccepted, ignored.
CLUSTERED COLUMNSTORE INDEXAccepted, ignored. (Storage is columnar-by-default via Delta.)
PARTITION (col RANGE RIGHT ...)Not supported. Use Delta partitioning on the Lakehouse side.
Statistics (CREATE STATISTICS)Auto-managed; manual stats are a no-op.

Your DDL migrates clean. Your query plans don’t.

The specific regression we see most: a star-schema join that was hash-distributed on the grain key was executing in Dedicated Pool without a shuffle (co-located join). In Fabric, the same join produces a broadcast or redistribution step in the plan, and for a large fact table the query goes from 4 seconds to 45 seconds. Multiply by a dashboard with 20 visuals, and your Power BI page load doubles or triples.

The diagnostic:

  • Run the top 10 longest-running queries from your Dedicated Pool QPI on the Fabric Warehouse with the same data volume.
  • Compare plans via SET STATISTICS PROFILE ON equivalent in Fabric — look for shuffle/broadcast steps that weren’t there before.
  • If the engine is shuffling gigabytes on joins that used to be co-located, the fix is upstream: re-shape the semantic model, pre-aggregate into materialized views (lakehouse side), or denormalize the join key into the fact table.

The broader lesson: Fabric Warehouse rewards queries that would perform well on any modern lakehouse engine — narrow predicates, selective reads, reasonable cardinalities. It punishes queries that were written to exploit Dedicated Pool’s specific distribution assumptions. A surprising number of “simple” Synapse queries are the second kind.

3. OPENROWSET and external tables are gone — rebuild the ad-hoc layer

This one mostly bites teams migrating from Synapse Serverless (not Dedicated). Serverless workloads typically build a “virtual warehouse” over ADLS with patterns like:

CREATE VIEW sales_2025 AS
SELECT *
FROM OPENROWSET(
  BULK 'https://adls.dfs.core.windows.net/raw/sales/2025/*.parquet',
  FORMAT = 'PARQUET'
) AS s;

Or CETAS (CREATE EXTERNAL TABLE AS SELECT) to write curated parquet back into the lake.

Fabric supports neither pattern. No OPENROWSET. No CREATE EXTERNAL TABLE. No CETAS. The T-SQL surface of Fabric Warehouse only queries managed tables in the Warehouse and discovered Delta tables in a Lakehouse attached to the same workspace. Sam Debruyn’s migration notes were the first place we saw this called out clearly, and it’s still the single biggest structural surprise for Serverless teams.

Why is this silent? Because the CI pipeline doesn’t run your SQL views against Fabric before deploy — it deploys the DDL, which succeeds for the CREATE VIEW statements that wrap OPENROWSET. The breakage only surfaces when a consumer queries the view and gets OPENROWSET is not supported in Microsoft Fabric.

The migration path is not “rewrite the view.” It’s “rebuild the ad-hoc layer as an ingestion layer”:

  1. Move the data into Delta format in a Lakehouse. Parquet files sitting in ADLS need to become Delta tables. A right-click → “Load to Tables” in Fabric works for single-file parquet, but it silently skips partitioned parquet datasets — those need a notebook that reads the parquet with Spark and writes back as Delta. This is the gotcha within the gotcha.
  2. Replace OPENROWSET views with regular T-SQL views over the SQL endpoint that Fabric auto-exposes on the Lakehouse.
  3. Replace CETAS with a Fabric Data Pipeline or notebook that writes Delta directly. There’s no lazy “materialize from SELECT” equivalent — you run an explicit job.
  4. Shortcuts work, but only if the target is already Delta on a compatible storage account. Bare parquet in ADLS can’t be shortcut’d as a queryable table without conversion.

The effort is proportional to how hard you leaned on Serverless as a schema-on-read surface. Teams who used it for occasional exploration: day of work. Teams who built a production virtual warehouse of 200 OPENROWSET views: multi-sprint ingest-layer buildout, and it’s the right time to also reshape the data engineering layer rather than porting anti-patterns forward.

4. F-SKU smoothing hides cost blowups until the overage meter fires

Dedicated Pool cost was predictable in a boring, beautiful way. You sized a DW500c, you paid a fixed hourly rate, and throttling showed up instantly as queued queries. You could eyeball the DWU% graph and know exactly what was happening.

Fabric F-SKUs behave differently, and the operational model is the part migration teams underestimate most. Two mechanisms interact:

Smoothing. When your workload exceeds 100% of the capacity, Fabric doesn’t throttle immediately — it spreads the “excess” CU-seconds forward in time. Interactive operations smooth over 5 minutes; background operations smooth over 24 hours. This means a 10-minute spike at 180% utilization shows up on the capacity graph as a flat hour at 115%.

Overage protection and overage billing. Once you cross 100%, Fabric gives you 10 minutes of “overage protection” before throttling kicks in. If you’ve enabled the newer capacity overage feature — which lets workloads keep running past 100% indefinitely — Microsoft bills the excess CU-hours through a separate meter at 3x pay-as-you-go rates.

Why this bites migrating teams:

  • They size the F-SKU against the average CU draw of a trial workload, not the 95th-percentile smoothed draw. Smoothing hides the peaks.
  • They enable capacity overage on day one “just in case” because it ships in preview as a checkbox in the capacity admin page. The first heavy load day runs at 1.6x capacity for an hour, and the bill reflects 3x the PAYG rate on that overage.
  • The cumulative carry-forward from a throttling event, if you enable overage during it, is billed immediately as overage. Circyl’s walkthrough of smoothing-throttling-overage interactions is the clearest treatment we’ve seen of how these three mechanics compound.

The diagnostic protocol we use on migrations:

  1. Before sizing the production F-SKU, stand up a temporary capacity one size smaller and run a representative 7-day workload.
  2. Install the Capacity Metrics app and watch the smoothed CU graph, not just the raw CU draw.
  3. Size the production SKU against the P95 smoothed CU, with 20–30% headroom for background refreshes and ad-hoc query bursts.
  4. If the workload is bursty (Monday morning dashboard refreshes, end-of-quarter reporting), consider a dev/test F-SKU paused most of the time plus a smaller persistent SKU, rather than one always-on large SKU. F-SKUs bill zero while paused, which is a real lever most migrations don’t use.
  5. Enable capacity overage only after you’ve characterized the workload, and set the limit deliberately.

The point is not that Fabric is more expensive — it can be cheaper than a comparable Dedicated Pool, especially for workloads with real idle time. The point is that the cost shape is different, and the smoothing/overage mechanics make the old “eyeball the DWU% graph” intuition dangerously wrong.

The migration audit we run before cutover

If you’re about to start a Synapse → Fabric migration, four things worth auditing before you run the assessment tool:

  1. Grep your DDL for IDENTITY across every dimension and staging table. Decide per-table whether to move to sequences, hash keys, or ROW_NUMBER patterns. Don’t let the migration tool silently strip the property.
  2. Grep for distribution hints (DISTRIBUTION =, CLUSTERED COLUMNSTORE, PARTITION). Accept that they’re decorative in Fabric, and flag the top 10 most expensive queries for replanning.
  3. Grep for OPENROWSET, CREATE EXTERNAL TABLE, and CETAS. Every hit is a layer you have to replace with a Lakehouse-based ingestion and a standard view. Partitioned parquet needs a Spark-based conversion.
  4. Characterize workload CU draw on a throwaway capacity for a week before committing to an F-SKU. Size against the P95 smoothed value.

This audit catches maybe 80% of the silent breakages we see in the field. The other 20% is workload-specific — security model changes (Entra-only now, no Managed Identity over Service Principals in the old-school way), Purview classification re-derivation, Power BI dataset reparenting when the workspace moves. We cover those in longer engagements.

If you’d rather have a second pair of eyes before cutover, the fabric migration playbook walks the full sequence, and the shortest way to de-risk a live migration is a 2-week assessment with a team that’s done this before — reach out via Microsoft Azure and Fabric services. We’d rather you not discover the four things above in production.

Neeraj Agarwal

Neeraj Agarwal

Founder & CEO, Algoscale

Neeraj has led AI and data engagements for Fortune 500 clients across finance, healthcare, and retail. He writes about what actually ships — not what looks good in a slide.

Related reading

More on this topic

Pick your starting point

Two quick diagnostics for the two questions we get most

No sales calls required to get real answers. Both tools return dedicated output in under 5 minutes.