Field Notes from Building a BI Platform on AWS and Power BI

This article is a collection of lessons I’ve learned (mostly the hard way) while building a business intelligence platform using AWS and Power BI. It’s not a how-to guide. It’s more like a field notebook, sometimes messy, occasionally political but hopefully useful.

This project began a few years ago, before Microsoft Fabric had entered the scene. The company I was working with already had a well-established AWS setup, so using AWS was the obvious choice because of the existing infrastructure and service credits.

At the time, it felt like I was doing something revolutionary. I had all the architectural diagrams, the cloud stack, the notebooks, and the pipelines. I had a grand vision. What I didn’t fully grasp was just how painful, political, and occasionally hilarious it could be to get one of these platforms working smoothly across an entire organisation.

Don’t solve every data problem downstream just because you can

Businesses are chaotic systems filled with very human actors, everyone has their own incentives, blind spots, and impressive creative workarounds. Working within such systems it’s understandable to prefer to tweak things quietly in the background. Sometimes it’s even faster to do it this way and that seems smart when you’re in the thick of delivery and the deadline was yesterday. It’s also tempting to avoid awkward conversations with the business, but you should always strongly advocate for correcting issues at their inception.

Complex transformations can clean things up quickly, but the technical debt builds up. You end up masking upstream problems that really need addressing. Those messy fields and missing values often point to broken processes or poor training. Getting leadership to focus on improving upstream data entry or systems design is critical.

Take sales data, for example. No one is docking the bonus of a salesperson who just closed a ten million dollar deal because they entered the wrong product code. Realistically, data quality will never be perfect but it is reasonable to make it a priority regardless. Best practice is to apply clever business rules to catch and correct common issues, but remember to log them. Publish them. Give people the chance to fix things at the source. This latter component is what my system was missing and I learnt the hard way.

Beware the shiny new thing

New tools and frameworks appear faster than you can rebuild a semantic model. It’s easy to get excited and be convinced that this next thing will solve all your problems, I certainly was.

At one point I joined the data lake party. I was convinced ELT into a lake was the future and that warehouses were old-fashioned. People warned me about data swamps. I laughed. Ultimate scalability, low cost storage and serverless management, it sounded like a dream come true. Selecting the right tool for the job, though, is always key.

Since then I’ve learnt that data lakes and lakehouses are great for unstructured data or enormous datasets. They can compliment a hybrid architecture beautifully. For BI projects with manageable data volumes and a need for clean, aggregated reporting, warehouses are still the way to go.

I’m writing another article about the challenges of Spark for BI transformation work, so I’ll keep that part short. Let’s just say there were tears. There will be another piece coming soon comparing lakehouse, warehouse, and hybrid architectures for BI. Stay tuned.

The unsung heroes: Aggregation tables

Everyone loves a flexible model. Analysts want to slice and dice all day long. Users want to see every metric under the sun, preferably on one page, in a matrix, with fifteen slicers. It works brilliantly until Monday morning rolls around and everyone opens their reports at once.

Suddenly, your platform starts wheezing. Reports time out. People start sending angry emails. No one can load anything. It’s chaos and you’re baring the brunt of it all.

Nine times out of ten, these reports are showing month or quarter-level metrics that could easily be sourced from a much leaner aggregated table. Save the detailed fact tables for drilling down. Trust me, your CPU will thank you.

Make it stand out

Whatever it is, the way you tell your story online can make all the difference.

Data quality builds trust

No analyst wants to write them. Nobody trains you to do it. Data quality tests can seem boring, yet in software engineering, writing tests to prove something works is just part of the job. In data, it should be too.

It’s tempting to rush reports into the hands of users. There’s a thrill in it. You feel productive. You feel valuable, you’re praised for speed. Then the data turns out to be wrong, and your credibility starts leaking faster than your pipeline.

Write the tests and monitor the data. Even if something bad does get through, the fact you spotted it early and warned users will build more trust than pretending nothing happened. Business users can be unforgiving. One data error and your dashboard becomes a running joke. Without trust your hard work will never be used to it’s full potential, so please don’t skip on testing.

Engineering process & workflow

When I built the first version of my platform, it was a proof of concept. Management got excited and soon it was being used in production. Then they asked for changes. Then more changes. All on the live model! It was, frankly, a mess, and so was I. Turns out proper CI/CD isn’t just for software engineers.

Eventually, I implemented proper development workflows, Git-based version control, code review, and environment management. It changed everything. Soon enough managing a team became easier and deployments became predictable. Thankfully the stress levels went down too. This is second nature to engineers, but it’s a game-changer for data professionals moving into platform roles.

Delivery strategy

Everyone dreams of the all-knowing data model that reflects the entire business. Don’t try to build it all at once. Start with a single subject area, once you’ve nailed it you will prove value. Getting buy-in will flow from there. Make sure your metadata and tests are in place, then it’s time to expand to another subject area.

You can design for a bigger picture without delivering it in one go. Otherwise, you end up tangled in dependencies, missing pieces, and with a backlog of regrets.

Data democratisation sounds great on paper

In theory, a centralised semantic model that every department can use, with analysts free to explore and build reports, sounds ideal. In practice, it’s like handing out the keys to a Ferrari with no driving test.

Departments want everything all at once, with no trade-offs. Their analysts want to be heroes so reports multiply, but performance degrades. Suddenly, it’s not democratisation, it’s the tragedy of the commons.

We tried a lightweight hub-and-spoke model to give autonomy to the departments and reduce the bottlenecks of a centralised-only data team. It was promising at first, then it turned into a mess. Reports slowed and our team got the blame. We couldn’t rely on individual human actors, with their own priorities and inherent biases, to tidy up their reports or follow best practices.

It turns out democratising data requires just as much governance, if not more. Without it, you end up with freedom, sure, but also chaos.

Always more to learn

Some of these lessons deserve their own deep dive—especially around Spark vs SQL processing, lakehouse vs warehouse vs hybrid architectures, and the realities of building trustworthy semantic models. I’ll be writing about those soon.

Building a BI platform is hard. Building one that people trust and that scales gracefully is even harder. You will make mistakes. That’s fine. Just try not to make mine. Or if you do, write about them so others can have a laugh and learn something.

That’s what I’m doing.

Next
Next

Plato, Democracy, and the Data Delusion