"It Passed on My Machine", and 4 Others Said the Same
Test Automation

"It Passed on My Machine", and 4 Others Said the Same

Taia Dimitrova
Created: 2025-09-12Updated: 2025-09-12
6 min read

TL;DR - Docker solves visual test chaos

  • Visual regression tests are very sensitive to environment differences (OS, fonts, rendering).
  • Align your team by using Docker with the same OS as your pipeline.
  • Create all visual test baselines in Docker, not on your local OS.
  • Stop guessing, start shipping.

Visual testing in Playwright is amazing… until your tests start failing for no good reason.

If you've ever had a perfectly fine visual test pass locally but break in the pipeline, you're not alone. We faced the exact same problem - and here's how we solved it.

The real-life problem

I work in a QA team of 5+.

We all run the same Playwright tests on the same codebase, but there's a catch:

  • Two of us use macOS
  • One is on Windows
  • And our pipeline? It's Linux-based

You'd think this wouldn't matter - but with Playwright visual regression testing, it matters a lot.

Even the smallest OS-level differences - like font rendering, scrollbars, or antialiasing - can cause pixel-level mismatches.

That means:

  • One engineer writes a test and commits the snapshot.
  • Another pulls the repo, runs the same test… and the test fails.
  • CI sees a difference too - and your pipeline is red.
  • Everyone is confused.

We were wasting hours trying to figure out why a simple UI test was failing when the UI hadn't even changed.

Why OS matters in visual testing

Playwright's visual comparison uses screenshots and matches them against previously stored baseline images.

What most people don't realize until it's too late:

Those images are OS-sensitive.

The exact same test can generate a different snapshot just because it's run on macOS vs Linux.

This becomes a nightmare in cross-platform teams - or worse, when your pipeline doesn't match your local machine.

The solution: Use Docker to match CI

After trying to "just agree" to not update the snapshots unless necessary (didn't work), we finally settled on a proper solution:

Use Docker to create and validate all snapshots in the same OS as our CI pipeline.

Here's what changed:

  • We picked a shared Docker image (mcr.microsoft.com/playwright:v1.41.0-jammy)
  • Everyone runs tests using this image locally
  • Snapshots are created and validated inside the Docker container
  • Our CI pipeline uses the same image

Result: No more false positives, no more failed builds after merge.

It's consistent. It's predictable. It works.

Quick How-To

1. Create a Dockerfile

FROM mcr.microsoft.com/playwright:v1.44.0-jammy
 
WORKDIR /app
COPY . .
 
RUN npm ci
CMD ["npx", "playwright", "test", "--project=chromium"]

2. Add a simple Makefile

# Name of the Docker image we'll build
IMAGE_NAME := playwright-visual
 
.PHONY: build test
 
build:
	docker build -t $(IMAGE_NAME) .
 
test:
	docker run --rm \
	  -v "$(PWD)":/app -w /app \
	  -v "$(PWD)/.env":/app/.env:ro \
	  $(IMAGE_NAME) \
	  bash -lc "npx playwright test --grep @visual --project=chromium --workers=100% --retries=1"

3. Run it

make build
make test

And now… you're testing in the same environment as CI. No surprises. No PR fails due to 2px shifts.

My rule of thumb: Always run new or updated visual tests in Docker before merging.

Less noise. More trust in results.

But what about the pipeline?

Yes, you can configure your CI pipeline to run on any OS - but the real problem is your team's setup.

If your engineers are using different operating systems locally and creating visual snapshots on them, you'll never get consistency.

That's why Docker is your secret weapon.

It gives you:

  • A shared, predictable environment
  • Identical snapshot rendering
  • Reproducible results (no more "works on my machine")

What's Next?

This is part of my Playwright Chronicles series - real lessons I've learned and learning.

Next up:

👉 How to structure your Playwright Project for scale, sanity, and speed.