Skip to main content

Command Palette

Search for a command to run...

Shift Left and Continuous Testing: Stop Treating Quality as a Phase

Most teams confuse shift left with earlier QA and continuous testing with faster automation. Here is what both actually mean in practice

Published
12 min read
Shift Left and Continuous Testing: Stop Treating Quality as a Phase

If it hurts, do it more frequently, and bring the pain forward.

— Jez Humble & Dave Farley, Continuous Delivery (2010)

Shift left and continuous testing are two of the most misunderstood terms in software engineering. Not because they're complex, but because everyone thinks that they're already doing them.

They are not.

In the last post on the traditional quality model, the math was clear. A defect caught in requirements costs $1, but the same defect in production costs $10,000 or more.

The Traditional quality model will always have these problems unless the development workflow ensures that defects can be detected as early as possible.

Shift Left Testing and Continuous Testing are the exact two disciplines that directly address everything broken about the traditional model.

But there are a lot of misconceptions and wrong implementations of it.

Let's clear each of them and see what they actually mean in practice.

Shift Left Is Not about Moving Testers Earlier

Most teams hear "shift left" and immediately think,

" Let's get QA involved in sprint planning. Have testers review stories earlier. Write test cases before code."

That's a start. But it's not shift left. That's just earlier QA.

Shift left is a fundamentally different relationship between quality and the development process. It's the recognition that quality decisions are made throughout the entire lifecycle, in requirements, in architecture, in design, in code review, and that the people making those decisions need quality thinking, not just quality checking at the end.

The difference matters enormously.

Checking quality at the end means someone inspects what has already been built and flags the problems. Shifting quality left means quality constraints shape what gets built in the first place.

One is reactive. The other is preventive. And prevention, as we established, is cheaper by orders of magnitude.

What Shift Left Actually Looks Like

I have implemented shift-left practices across five different industries — aviation, healthcare, government, fintech, and enterprise software.

The pattern is consistent across domains.

It begins with the Requirements.

Not with test cases, with the requirements themselves. Before a single line of code is written, the question is: how will we know this works?

Not "what tests will QA run," but "what does correct behaviour actually mean for this feature, in this context, for this user?"

This question sounds simple, but it is equally hard to answer well. Most teams skip it and discover the ambiguity later, in production, at 10x the cost.

It extends into architecture.

Quality-relevant decisions like how failures are handled, how the system degrades under load, and how errors are surfaced to users are architectural decisions.

If QA or quality thinking is not in the room when those decisions are made, it inherits their consequences without having influenced them.

A Real Example Of What This Costs

I was working with a payment team building a flight booking platform.

The system was integrated with an external payment gateway that uses 3D Secure authentication, a redirect flow where the customer is sent to their card issuer's page to verify the transaction before returning to complete the booking.

The architecture also had a seat-hold timer.

When a customer selected a seat, the inventory system held it for fifteen minutes while they completed payment.

Standard practice, and it was logical on paper.

The design review had happened without QA, and then they were brought in to test it after the payment integration was already complete.

During Testing, as always, we test not only the happy flow but the edge cases too, so we started asking all the relevant questions, but there was one which nobody had asked during architecture -

What happens if the 3D Secure redirect takes longer than the hold window? On slow networks, on mobile, with certain card issuers that trigger a full challenge flow requiring OTP, the authentication could easily push past fifteen minutes.

The answer was not good.

The inventory service would release the seat the moment the hold expired. It had no awareness of an in-progress payment. The payment gateway, completely decoupled, would continue processing the charge.

The customer would complete 3DS authentication, the payment would succeed, and the booking would fail because the seat was already gone.

No refund trigger. No retry on the hold. No meaningful error message to the customer. Just a successful charge and a seat that belonged to someone else.

A big bug, and not only that.

Three services would now need to be changed.

The payment service, the inventory management layer, and the booking orchestration service all had to be redesigned to coordinate state across a flow none of them had been built to share.

The question "What happens if 3DS takes longer than the hold timer ?" was just a thirty-minute conversation, but that never happened until we found the issue during testing, six months after the architecture was locked.

By then, the cost of the answer had multiplied by an order of magnitude.

It lives in code review.

Unit testing, defensive coding, meaningful error handling, and observable instrumentation are quality practices that developers own. Not because QA handed them a checklist, but because the team has a shared definition of what "done" actually means.

Done doesn't mean code merged.

Done means observable, testable, and deployable with confidence.

It changes who asks the hard questions.

In a shift left environment, the question "what could go wrong here?" isn't only asked by QA during test execution. It's asked by developers writing the story, architects designing the solution, and product managers defining the acceptance criteria.

Quality thinking becomes distributed. The defect detection surface moves left, toward the source.

Quality becomes everyone's responsibility

When it's said that "Quality is everyone's responsibility", that's where it becomes the uncomfortable part, but Shift Left is not only a QA initiative; it's an engineering culture change.

QA can advocate for it, QA can model it, but QA can't impose it.

If leadership doesn't understand that quality is everyone's responsibility, shift left remains a slogan.

It's clear now that shift left moves the defect detection surface earlier, but earlier is not enough on its own.

You also need faster, and that's where continuous testing comes in.


Continuous Testing Is Not Running More Tests

Here's the second common misunderstanding.

Teams adopt CI/CD pipelines and run their test suites automatically on every commit. They call this continuous testing.

It's not.

Running the same tests faster is "Automation".

Continuous testing is a different thing entirely; it's the integration of quality signals throughout the delivery pipeline in a way that makes the cost of a defect visible at the moment it's introduced.

The distinction is timing, again, but finer-grained.

In traditional testing, you find out about a defect when QA runs the test suite, usually at the end of a sprint, during a testing phase, or in UAT.

In automated regression, you find out faster, but still after the fact.

In Continuous Testing, you find out at the point of change, the commit, the build, the deployment- before the defect has had time to compound into something harder to unpick.

That 10,000x cost multiplier operates at every stage, not just between requirements and production. A bug caught at the commit level costs almost nothing. The same bug caught after it's merged into main and touched five other services costs significantly more, even if it's caught before production.

Continuous Testing compresses that timeline to the minimum.

What Continuous Testing Actually Looks Like

It starts with fast feedback loops

The test suite has to be fast enough that developers don't route around it.

A suite that takes forty minutes to run gets run at the end of the day, if at all.

According to DORA research, high-performing teams maintain test suites that provide feedback quickly enough to keep developers in flow. The moment that the loop breaks, developers treat the pipeline as an obstacle rather than a quality gate. Speed is a quality property of the test suite itself. Most teams treat it as an afterthought.

Speed is a quality property of the test suite itself

It requires the right distribution of tests - The Test Pyramid

The test pyramid isn't just a concept; it's an engineering constraint.

Fast unit tests at the base. Component in the middle using mocks without external dependencies. Contract tests at the integration boundary, verifying that services honour the agreements they make with each other without requiring full end-to-end execution. Then, integration tests are run after that for real user flow validation, and lastly, a small number of critical end-to-end tests at the top.

Too many slow, expensive end-to-end tests and your continuous testing pipeline crawls.

The distribution determines the speed, and the speed determines whether the pipeline is trusted.

  • Write tests with different granularity

  • The higher-level you get the fewer tests you should have

Mike Cohn

It means testing at multiple levels simultaneously

Not sequentially.

Unit tests run at commit. Integration tests run at build. Performance baselines run at merge. Security scans run alongside. Production monitoring runs continuously. Each layer catches different failure modes at the appropriate cost.

It requires production to be part of the quality signal

This is where most teams stop short.

They treat production as outside the quality boundary, something that happens after QA signs off. But production is where real users encounter real failure modes under real conditions.

Feature flags, canary deployments, observability pipelines, anomaly detection, these are continuous testing practices. They extend quality visibility beyond the testing environment into the place where quality actually matters.

It changes how you think about test ownership

In a continuous testing environment, tests aren't only owned by QA.

They're also owned by the team that owns the functionality. QA sets the standards, defines the coverage criteria, and validates the approach. But the developer who writes the feature writes the unit tests. The team that owns the service, along with QA, maintains its integration tests.

QA then focuses on what requires QA-level expertise, exploratory testing, failure mode analysis, quality strategy, and the hard edge cases that automation can't anticipate.

You Cannot Test What You Cannot See

This is the insight most shift-left and continuous testing articles miss entirely, and it's the one that matters most.

Before a team invests in expanding test coverage, they need to be able to see their system behaving in production.

Logs that tell a story.

Metrics that surface anomalies before users notice them.

Traces that reveal what actually happened when something went wrong, not just that it went wrong.

Instrumentation has to come before automation, Not after.

Automation on top of a system you cannot observe just produces faster failures that are harder to diagnose. I've seen teams with 90% test coverage ship defects that took weeks to root-cause in production, not because their tests were wrong, but because they had no visibility into what the system was actually doing once it left the test environment.

The test suite tells you what the system does in the conditions you anticipated. Instrumentation tells you what it does in the conditions you didn't.

Both are necessary. The order matters.

Where Most Teams Get It Wrong

There's a failure mode I've seen consistently across organisations that attempt shift left and continuous testing without understanding what they're actually changing.

They treat it as a tooling problem.

They buy a CI/CD platform, adopt a test automation framework, mandate a shift left policy, and wonder why nothing changes. The bugs still pile up. Releases still slip. QA still becomes the bottleneck.

The tools are not the problem. The mental model is.

Shift left and continuous testing are not automation initiatives. They're quality philosophy changes that have technical implementations.

The philosophy has to come first.

What does quality mean for this system?

Who is responsible for it?

At what point in the workflow is a quality decision being made?

Where is the defect detection surface today, and what would it cost to move it left?

Until a team can answer those questions, adding more tools makes the situation more complicated without making it better.

The Pattern That Actually Works

Across fifteen years and multiple industries, the pattern that produces results looks the same everywhere.

It starts with a definition.

What does quality mean for this specific system, for these specific users, in this specific context?

That definition doesn't come from a template. It comes from engineering leadership understanding the product well enough to articulate what failure actually costs.

It continues with shared ownership. Not "QA is responsible for quality", that sentence is a trap.

Quality is a property of how the team works, not a function that gets delegated.

QA brings expertise. Everyone brings ownership.

It requires instrumentation before automation.

See the system first and then test it.

And it demands patience.

Shift left and continuous testing don't produce results in a sprint. The defect curve takes time to change. The cost curve takes time to visibly improve. Teams that abandon the approach after three months because they can't see the numbers move are making the same mistake as organisations that treat quality as a phase, optimising for the short term, paying for it in the long run.

The traditional model doesn't fail loudly. It fails slowly, invisibly, until the bill arrives all at once.


References & Further Reading