Hybrid Row-Level Security: AWS + Power BI
How we wired Azure AD identities to AWS Lake Formation to Power BI - with row-level security that keeps field, regional, and exec reports distinct.
Most “hybrid cloud” case studies are really just migration stories. This one isn’t. A US-based physical security and fire systems integrator had 14+ subsidiary companies worth of data in AWS, thousands of field technicians in Azure Active Directory, and a board that wanted three distinct views of the same data - enforced by who the user was, not by which dashboard they clicked.
Here is the architecture we stood up, the access pattern underneath it, and the two gotchas that will cost you a week if you replicate the stack without knowing about them. The full engagement is written up as a case study on the data-warehouse modernization we delivered for this integrator; this post is the access-layer deep-dive for the practitioners.
The starting point: one data estate, thousands of field workers, three audiences
The customer is a security and fire systems integrator grown through acquisition - 14+ subsidiary operating companies spanning integration, security, and fire divisions. Each subsidiary arrived with its own ERP instance (Microsoft Business Central, mostly, with a long tail), its own Dynamics 365 CRM tenant or local variant, and its own definition of “customer” and “project.” The pre-consolidation data estate looked like fourteen independent companies pretending to be one.
At the user layer there were three distinct reporting personas:
- Executives - consolidated rollup across all 14+ subsidiaries. Division-level (integration / security / fire) comparisons, quarter-over-quarter, revenue and operational KPIs at the group level. No drill to individual techs or jobs.
- Regional / subsidiary managers (RMs) - their subsidiary and territory only. Branch-level detail, job pipeline, inspection compliance, same dashboard structure across every subsidiary so execs could compare like-for-like.
- Field technicians and inspectors - thousands of them, on-the-job, doing fire system inspections and security installs. Each sees only their own work: assigned jobs, their completion rates, their compliance scores. Emphatically not their peers’ numbers by name.
On the technology side:
- Data lived in AWS. S3 as the landing zone, Glue + PySpark for the 20+ ETL jobs, Step Functions orchestrating the daily pipeline, AWS DMS streaming CDC from the operational ERPs/CRMs, and Lake Formation governing the catalog.
- Identity lived in Azure AD - the HRMS (UKG) pushed new hires, role changes, and terminations into AD as the system of truth; everything else read from it.
- Reporting was Power BI. 20+ KPI dashboards across Finance, Operations, HR, and Sales. Licensing was already paid under the M365 agreement; the user population was already in AD; switching to QuickSight was never on the table.
Asking the customer to move data to Azure was off the table - multi-year AWS contract. Asking them to replicate identity into AWS IAM was off the table - two sources of truth for “who works here” is a governance disaster, especially with thousands of field workers rotating through. So we had to make the two clouds cooperate.
The architecture, top to bottom
Azure AD (identity)
│
│ SAML 2.0 / OIDC federation
▼
AWS IAM (role-based trust)
│
│ STS AssumeRoleWithSAML
▼
Lake Formation (authorization)
│
├── Row-level filters by user attribute
├── Column-level masking for PII
└── Tag-based access control
│
▼
Athena (query layer)
│
▼
Power BI (presentation)
│
├── Executive workspace
├── Regional workspace (RLS on region)
└── Field workspace (RLS on store)
Each arrow carries user identity forward - so that by the time a query hits a table, Lake Formation knows exactly who is asking and what they are allowed to see.
Identity: Azure AD as the root of trust
Nothing in this architecture works if the AD groups don’t reflect reality. UKG pushed Subsidiary, Territory, and TechnicianId as AD user attributes on every hire, transfer, and termination. Dynamic AD groups then materialized off those attributes:
rg-executives- static group, holdco leadershiprg-subsidiary-managers- dynamic rule matching ops-leader job titles across all 14+ subsidiariesrg-field-technicians- dynamic rule matching field-job titles (inspector, installer, service tech)
The Subsidiary, Territory, and TechnicianId attributes flowed through to the federation layer as SAML assertions. That is the single most important implementation detail in the whole stack. Without accurate AD attributes you have no row filters; without dynamic groups you spend every Monday reassigning permissions when field techs transfer between subsidiaries - and with thousands of them rotating through, that is a full-time job you do not want to create.
The federation bridge: Azure AD to AWS IAM
AWS has had SAML federation for years; the modern path is to register Azure AD as a SAML 2.0 identity provider in IAM and then create IAM roles that trust it. A condensed version of the role trust policy:
{
"Effect": "Allow",
"Principal": { "Federated": "arn:aws:iam::123456789012:saml-provider/AzureAD" },
"Action": "sts:AssumeRoleWithSAML",
"Condition": {
"StringEquals": { "SAML:aud": "https://signin.aws.amazon.com/saml" }
}
}
Three IAM roles were created, each mapped to one AD group via the Enterprise Applications claim rules in Azure:
LakeFormation-Executive(full catalog read across all 14+ subsidiaries)LakeFormation-SubsidiaryManager(Lake Formation enforces subsidiary + territory filter)LakeFormation-FieldTechnician(Lake Formation enforces per-technician filter)
When a user signs in to AWS via the Azure AD app tile, Azure issues a SAML assertion, AWS STS validates it, and the session credentials carry PrincipalTag/Subsidiary, PrincipalTag/Territory, and PrincipalTag/TechnicianId claims that Lake Formation reads later. Those claims are the whole key to the row-level filter.
Lake Formation: where permissions actually live
This is the hop that most “AWS + Power BI” tutorials skip. Giving a user an IAM role with athena:* is not the same as giving them data - Lake Formation sits between. LF permissions are set on the Glue Catalog (databases, tables, columns) and they override coarse IAM permissions for tables it governs.
The data warehouse we built had three layers in Glue Catalog:
raw_*- bronze landing tables, no user-facing grantsconformed_*- silver joins, accessible to analysts onlyreporting_*- gold, the only layer Power BI queries
Lake Formation permissions on reporting_* were set via tag-based access control (LF-TBAC) rather than explicit table grants. Every table in reporting_* was tagged with domain=sales, sensitivity=standard|pii, etc. The IAM roles got grants against those tags instead of individual tables - so when a new table shipped under the sales domain, the existing permissions flowed to it automatically. One less deployment friction.
The row-level filter
The piece that does the actual filtering work is a Lake Formation data filter. A data filter is a named expression stored on a table that LF applies to every query, transparently:
-- Filter name: subsidiary_scope
-- Target table: reporting_jobs_daily
-- Filter expression:
subsidiary_id = current_user_attribute('Subsidiary')
The current_user_attribute('Subsidiary') function reads the session tag AWS propagates from the SAML assertion. The result: a subsidiary manager running SELECT * FROM reporting_jobs_daily sees only their subsidiary’s rows. No WHERE clause in the query, no report-level filter to forget, no report author needs to remember anything. The filter runs in Lake Formation before Athena returns results.
Field technicians got a second data filter keyed on technician_id - each tech sees their own assigned jobs and their own compliance scores, but not their peers’. Executives had no filter. Data analysts got a third filter that masked PII columns (customer_email redacted, customer_phone redacted) while leaving other columns alone.
Connecting Power BI: the two-hop problem
Here is the first non-obvious thing. Power BI does not natively speak SAML. Its Athena connector authenticates with AWS IAM credentials (access key + secret), not with a user’s federated SAML session. So the natural instinct - “Power BI signs in as the user” - doesn’t work out of the box.
The pattern we landed on:
- Create a dedicated IAM role
PowerBI-AthenaServicethat can queryreporting_*tables - Power BI workspaces connect to Athena as that service role (credentials stored in the Power BI data gateway)
- Row-level security moves into the Power BI semantic model, not Lake Formation
Wait - what happened to the whole Lake Formation row filter setup? It still runs, but now it guards direct Athena access (analysts running ad-hoc SQL, scripts assuming the user’s SAML role). The reporting pipeline uses the service role; RLS for reports is enforced at the Power BI layer using the user’s UPN.
This sounds like a downgrade. It isn’t - it’s just a different layer for a different access path. The Power BI semantic model has a [Subsidiary] dimension and defines an RLS rule:
[Subsidiary] = LOOKUPVALUE(Users[Subsidiary], Users[UPN], USERPRINCIPALNAME())
USERPRINCIPALNAME() returns the Azure AD UPN of the signed-in Power BI user. The Users dimension is refreshed nightly from AD (same UKG-sourced attributes). When a subsidiary manager opens their dashboard, Power BI filters every visual to their subsidiary automatically. A field technician gets a narrower rule keyed on TechnicianId that ends up in the same dataset but only returns their own jobs.
Why both layers
A single RLS layer is a risk surface. Lake Formation alone means anyone with the Power BI service role can bypass RLS. Power BI semantic model alone means anyone with Athena access can bypass RLS. Running both means a query has to pass both checks to surface data - and every access path (SQL-direct via Athena, Power BI report) lands in the same answer.
For regulated data specifically (healthcare, finance), regulators often require defense in depth on access controls - not “the RLS is set up in Power BI, promise.” Two enforcement layers, both independently auditable, is the posture they want to see. Pair this with our data governance consulting practice if auditor-ready controls are part of the engagement.
Three Power BI workspaces, not three reports
A common mistake: building one workspace with three different reports for three audiences. That design puts all the RLS logic in the report filters, duplicates dataset refreshes, and creates cross-visibility risk - any user who can open the workspace can (usually) see all artifacts inside it.
We built three workspaces, each with its own Power BI app:
- Executive workspace - one dataset, rollup-only dashboards spanning all 14+ subsidiaries. Executives are added as workspace viewers. No RLS, because the rolled-up data has no row-level privacy.
- Subsidiary-managers workspace - a separate dataset with RLS on
[Subsidiary]. All SMs are viewers. Each sees only their subsidiary when they open the app. - Field-workforce workspace - a third dataset with RLS on
[TechnicianId]. Every inspector and installer is a viewer. Each sees their own assigned jobs and compliance scorecard.
Why three workspaces instead of three apps off one dataset? Because RLS in Power BI is per-dataset. If a subsidiary manager is accidentally added as a workspace admin (not just a viewer) on the executive workspace, RLS does nothing for them - admins bypass RLS. Separate workspaces isolate admin access and prevent cross-audience leakage through over-granted roles. When the customer’s internal audit showed up six months later, this separation is the architectural choice they asked about first.
What we shipped
Concrete outcomes from the engagement (the full write-up is on the case study page):
- 14+ subsidiary ERPs unified into a single governed data lake on AWS S3, with AWS DMS streaming CDC from every operational source
- 20+ real-time Power BI KPI dashboards across Finance, Operations, HR, and Sales - split across the three workspaces described above
- 70% reduction in reporting preparation time - finance stopped rebuilding the weekly rollup in Excel and started trusting the pipeline
- 100% automated daily pipeline - Step Functions orchestrating Glue + PySpark; zero manual reruns after the initial backfill
- Custom MDM ledger in RDS PostgreSQL - a 75+ brand dictionary resolving customer identities across the 14+ subsidiaries, without which the consolidated revenue rollup would have double-counted everything
- Thousands of field technicians authenticated via existing Azure AD credentials - no new passwords, no new portal
- 3 IAM roles + 1 Power BI service role - the entire access matrix fits on one whiteboard
Audit outcome: the customer’s annual security audit the following quarter cleared the data platform without findings. Two independent RLS enforcement layers, attribute flow traceable from HR system (UKG) to dashboard, and no shadow service accounts was the specific combination the audit team called out.
Two gotchas worth knowing about
The enterprise engineering team was sharp and still hit both of these.
Gotcha 1: Power BI’s “Azure AD” is not always the same tenant as your AWS federation’s “Azure AD.” Large enterprises often run multiple AD tenants - one for M365 workloads, one for developer tools, one left over from an acquisition. Verify that the tenant Power BI is licensed against is the same tenant issuing SAML assertions to AWS. If not, UPNs will not match across layers and your RLS lookups silently return nothing (not an error - just empty datasets). Pre-flight check: pick three users across personas and verify their UPN is identical in Power BI and in the AD attributes that feed AWS SAML claims.
Gotcha 2: Lake Formation data filters have per-table quotas (check the current AWS limit - at time of writing it is 1,000 data filters per table). If your filter design requires a different filter per regional manager, you hit that ceiling fast. The fix is what we did: one parameterized data filter that reads a session attribute, not N hard-coded filters. The temptation to create filter_region_northeast, filter_region_southeast, etc., is strong and leads to an unmaintainable mess.
Where the pattern has been validated at larger scale
This is not a novel architecture. BMW Group runs a structurally identical pattern across their data mesh - federated identity feeding Lake Formation’s fine-grained access control, attribute-based filters gating access to data product tables, and downstream analytics tools reading through that enforcement layer. Their public AWS case study on Lake Formation fine-grained access control walks through the same federated-identity-to-LF-filter design applied at considerably larger scale (thousands of data products across hundreds of teams, versus 14+ subsidiaries here).
The takeaway for enterprises evaluating the pattern: the plumbing is proven. The work is in the AD attributes flowing cleanly from your HR system and the discipline to keep enforcement in Lake Formation (not scattered across reports). BMW’s writeup and this engagement arrive at the same answer from opposite ends of the scale spectrum.
When this pattern fits
The hybrid AWS + Azure AD + Power BI pattern is right when:
- Data already lives in AWS and moving it is not an option
- Identity lives in Azure AD (standard for M365 shops)
- Row-level security matters at scale - typically regulated industries, multi-brand retail, or distributed field operations with thousands of workers
- Power BI is already the reporting tool (avoid a BI migration unless you have another reason)
If any of those constraints don’t apply, simpler patterns exist. If all of them do, this is the path of least regret - and the one that cleared the customer’s audit the first time.
For a deeper architectural read on multi-cloud identity flow, the hybrid cloud engagement playbook walks through the governance layer in detail. If the Power BI side is where you’re stuck, Microsoft Power BI consulting covers the semantic model and RLS patterns specifically. And if you arrived here because consolidating multiple acquired companies’ data is what’s actually driving the access-control complexity, the 180-day M&A data consolidation playbook is the companion piece to this one.
Founder & CEO, Algoscale
Neeraj has led AI and data engagements for Fortune 500 clients across finance, healthcare, and retail. He writes about what actually ships — not what looks good in a slide.