Setting up a delivery pipeline for your agentic iOS projects

Published on: February 16, 2026

A while back, my app crashed mid-workout at the gym. I uploaded the crash report, gave my AI agent some context, and went back to my set. By the time I finished, there was a pull request waiting for me. I reviewed it, merged it, and had a fixed TestFlight build on my device shortly after — without ever opening Xcode.

That kind of turnaround is only possible because of the delivery pipeline I've built around agentic engineering. And that's what this post is about. Know that this post doesn't introduce anything revolutionary in terms of how I work. But this is a setup that works well for me, and I think in this day and age, it's important for folks to get some insights into what others are doing instead of seeing yet another "I SHIP TONS OF AI CODE" post.

I'm hoping to be a little more balanced than that...

Agentic engineering (aka vibe coding) is becoming more popular by the day. More and more developers are letting AI agents handle large parts of their iOS projects, and honestly, I get it. It's incredibly productive. But it comes with a real risk: when you hand off the coding to an agent, quality and architecture can degrade fast if you don't have the right guardrails in place.

In this post, I want to walk you through the pipeline I use to make sure that even though I do agentic engineering, my product quality stays solid (yes, it involves me reading the code and sometimes tweaking by hand).

We'll cover setting up your local environment, why planning mode matters, automated PR reviews with Cursor's BugBot, running CI builds and tests with Bitrise, and the magic of having TestFlight builds land on your device almost immediately after merging.

If you're interested in my broader thoughts on balancing AI and quality, you might enjoy my earlier post on the importance of human touch in AI-driven development.

Setting up your local environment for agentic engineering

Everything starts locally. Before you even think about CI/CD or automated reviews, you need to make sure your AI agent knows how to write code the way you want it written. The most important tool for this is an agents.md file (or your editor's equivalent, like Cursor rules).

Think of agents.md as a coding standards document for your agent. It tells the agent what language features to prefer, how to structure code, and what conventions to follow. Here's an example of what mine looks like for an iOS project:

## Swift code conventions

- Use 2-space indentation
- Prefer SwiftUI over UIKit unless explicitly targeting UIKit
- Target iOS 26 and Swift 6.2
- Use async/await over completion handlers
- Prefer structured concurrency over unstructured tasks

## Architecture

- Use MVVM with Observable for view models
- Keep views thin; move logic into view models or dedicated services
- Never put networking code directly in a view

## Testing

- Write tests for all new logic using Swift Testing
- Run tests before creating a pull request
- Prefer testing behavior over implementation details

Add this file to the root of your Xcode project, and Xcode 26.3's agent will pick up your rules too.

This file is just a starting point. The thing is, your agents.md is a living document. Every time the agent does something you don't like, you add a rule. Every time you notice a pattern that works well, you codify it. I update mine constantly.

For example, I might notice my agent creating new networking helper classes instead of using the APIClient I already had. So I can add a rule: "Always use the existing APIClient for network requests. Never create new networking helpers.". From that moment on, the agent should honor my preferences and use existing code instead of adding new code.

Beyond rules, you can also equip your agent with skills. A skill is a standalone Markdown file that teaches the agent about a specific topic in depth. Where agents.md sets broad rules and conventions, a skill usually contains detailed patterns for how to structure things like SwiftUI navigation, handle Swift Concurrency safely, or work with Core Data. Xcode 26.3 even has an MCP (you can more or less think of that as a predecessor of skills) that can help agents find documentation, best practices, and more.

Your local environment is the foundation. Everything that comes after (PR reviews, CI, TestFlight) depends on the agent producing reasonable code in the first place.

Planning before building

This is the step that, in my opinion, carries a ton of value but is easy to skip.

If you use Cursor (or a similar tool), you probably have access to a planning mode. Instead of letting the agent jump straight into writing code, you ask it to make a plan first. The agent outlines what it intends to do — which files it'll change, what approach it'll take, what tradeoffs it's considering — and you review that plan before giving the green light.

The difference between "fire off a prompt and hope for the best" and "review a plan, then execute" is huge. When you review the plan, you catch bad architectural decisions before they become bad code. You can steer the agent toward the right approach without having to undo a bunch of work.

Planning will also make it more obvious if the agent misunderstood you. For example, if your prompt isn't super targeted to tackle all ambiguity up-front, the agent might confidently think you meant one thing while you meant another. A funny example is "persist this data on device" and the agent assumes "write to user default" when you meant "create Swift Data models". You can often catch these things in planning mode and fix the agent's trajectory.

In practice, my workflow looks like this: I describe what I want in planning mode, the agent proposes an approach, I give feedback or approve, and only then does the agent switch to implementation. Going through planning first can feel slow but usually I find that it makes the output so much better that it's 100% worth it.

For example, when I wanted to add a streaks feature to Maxine, the agent proposed creating an entirely new data model and view model from scratch. In the plan review, I noticed it was going to duplicate logic I already had in my workout history queries. I steered it toward reusing that existing data layer, and the result was cleaner and more maintainable. Without the planning step, I would have ended up with redundant code that I'd have to clean up later.

Automated PR reviews with BugBot

Once the agent has written code and I've done a quick check to review changes, I run the code on my device to make sure things look and feel right. Once I sign off, the agent can make a PR on my repo. If the agent is running in the cloud, I skip this step entirely and the agent will make a PR immediately when it thinks it's done.

This is where BugBot comes in. BugBot is part of Cursor's ecosystem and it automatically reviews your pull requests. It looks for logic issues, edge cases, and unintended changes that I might miss during a quick scan. It can even push fixes directly to the PR branch.

BugBot has been invaluable in my process because even though I do my own PR review, the whole point of agentic engineering is to let the agent handle as much as possible. My goal is to kick off a prompt, quickly eyeball the result, run it on my device, and move on. BugBot acts as an automated safety net that catches what I might not.

Let me give you two examples from Maxine. The first is about edge cases. Maxine recovers your workout if the app crashes. BugBot flagged that there was a possible condition where, if the user tapped "start workout" before the recovery completed, the app would attempt to start a Watch workout twice. Honestly, I considered this scenario nearly impossible in practice — but the code allowed it. Instead of relying on what I couldn't realistically test, BugBot added safeguards to make sure this path was handled properly. That's exactly the kind of thing I'd never catch during a quick eyeball review.

The second is about unintended changes. I once had a PR where I had left behind a few orphaned debugging properties. BugBot spotted them as "probably not part of this change" — the PR description the agent had written didn't mention them (because I did the debugging myself), and no code actually referenced these properties. BugBot removed them. Small thing, but it's the kind of cleanup that keeps your codebase tidy when you're moving fast and reviewing quickly.

Running builds and tests with Bitrise

Even though the agent runs tests locally before I ever see the code, I want a second layer of confidence. That's where CI comes in. I use Bitrise for this, but the same workflow concepts apply to Xcode Cloud, GitHub Actions, or any CI provider that can run xcodebuild.

This step is even more important for my cloud based agents because those don't get access to xcodebuild at all.

I have two Bitrise workflows set up for my projects, each triggered by different events.

The test workflow (runs on every PR)

The first workflow is a test-only pipeline that triggers whenever a pull request is opened or updated. The steps are minimal:

Clone the repository
Resolve Swift packages
Run the test suite with xcodebuild test

That's it. No archiving, no signing, no uploading. The only job of this workflow is to answer one question: do the tests still pass? If something the agent wrote (or something BugBot fixed) breaks a test, I know before I merge. And I can tell an agent to go fix whatever Bitrise reported.

I set this up as a trigger on pull requests targeting my main branch. Bitrise picks up the PR automatically, runs the workflow, and reports the result back as a GitHub status check. If it's red, I don't merge.

The release workflow (runs on merge to main)

The second workflow triggers when something is pushed to main — which in practice means when a PR is merged. This one does significantly more:

Clone the repository
Resolve Swift packages
Run the full test suite
Archive the app with release signing
Upload the build to App Store Connect

The test step might feel redundant since we already tested on the PR, but I like having it here as a final safety net. Merges can occasionally introduce issues (especially if multiple PRs land close together), and I'd rather catch that before uploading a broken build.

The archive and upload steps use Bitrise's built-in steps for Xcode archiving and App Store Connect deployment. You set up your signing certificates and provisioning profiles once in Bitrise's code signing tab, and from that point on, every merge produces a signed build that goes straight to TestFlight.

Why tests matter even more with AI

Having a solid test suite is probably the most impactful thing you can do for agentic engineering. Your tests act as a contract. They tell the agent what correct behavior looks like, and they catch regressions in CI even if the agent's local run somehow missed something. Better tests mean more confidence, which means you can let the agent handle more.

By the time I actually hit "merge" on a pull request, the code has been through: local tests by the agent, my own quick review, BugBot's automated review, and a green Bitrise build. That's a lot of confidence for very little manual effort.

The magic of fast TestFlight feedback

This is where everything I wrote about so far comes together. Because the release workflow uploads every merge to App Store Connect automatically, every single merge to main results in a TestFlight build — no manual intervention required. You don't open Xcode, you don't archive locally, nothing. You merge, and a few minutes later there's a new build in TestFlight. This closes the loop from "I had an idea" to "I have a build on my device" with minimal friction.

When you're testing your app in the field and you notice something you want to tweak — a layout that feels off, a label that's unclear, a flow that's clunky — you can often just tell your agent what to fix. If the change is simple enough and you're good at prompting and planning, you can have a new build on your device surprisingly quickly. Through your local planning, through the PR, through Bitrise, and onto your device via TestFlight.

Let's go back to the example from the intro of the post...

During one of my workouts with Maxine the app crashed. Right there in the gym, I pulled up Cursor, uploaded the crash report that TestFlight gave me, added some context about what I was doing in the app, and kicked off a prompt. Then I just resumed my workout.

By the time I was done, there was a PR waiting for me. The fix wasn't perfect — I had to nudge a few things — but the bulk of the work was done. I merged it, Bitrise picked it up, and I had a new TestFlight build shortly after. All while I was focused on my workout, not on debugging.

That's what happens when every piece of the pipeline is automated. The agent writes the fix, BugBot reviews it, Bitrise tests and builds it, and TestFlight delivers it. Your job is to steer, not to crank.

Summary

Agentic engineering doesn't mean giving up on quality. It means building the right guardrails so you can move fast without breaking things.

The pipeline I use looks like this: a well-maintained agents.md and AI skills set the foundation locally. Planning mode ensures the agent's approach is sound before it writes a line of code. BugBot catches issues in pull requests that I might miss. Bitrise runs tests on every PR and archives plus uploads on every merge to main. And TestFlight delivers the result to my device automatically.

Each piece reinforces the others. Without good local setup, the agent writes worse code. Without planning, it makes bad architectural decisions. Without BugBot and Bitrise, bugs slip through. Without automatic TestFlight uploads, the feedback loop is too slow to be useful.

To be clear: this pipeline doesn't catch everything. An agent can still write code that passes all tests but is architecturally questionable, and BugBot won't always flag it. You still need to review and think critically. But the combination of all these layers seriously cuts down the risk of shipping something broken — and that's the point. It's about reducing risk, not eliminating it.

If you're prototyping or just exploring an idea, you probably don't need all of this right away. But the moment you have real users depending on your app, this kind of pipeline pays for itself. Set it up once, iterate on your agents.md as you go, and you'll be able to move fast without sacrificing the quality your users expect.

Setting up a delivery pipeline for your agentic iOS projects

Setting up your local environment for agentic engineering

Planning before building

Automated PR reviews with BugBot

Running builds and tests with Bitrise

The test workflow (runs on every PR)

The release workflow (runs on merge to main)

Why tests matter even more with AI

The magic of fast TestFlight feedback

Summary

Categories

Expand your learning with my books

Subscribe to my newsletter and never miss a post

Setting up a delivery pipeline for your agentic iOS projects

Setting up your local environment for agentic engineering

Planning before building

Automated PR reviews with BugBot

Running builds and tests with Bitrise

The test workflow (runs on every PR)

The release workflow (runs on merge to main)

Why tests matter even more with AI

The magic of fast TestFlight feedback

Summary

Categories

You might also like...

The importance of human touch in AI-driven development

Migrating an iOS app from Paid up Front to Freemium

Opting your app out of the Liquid Glass redesign with Xcode 26

Staying productive as an indie developer

Modern logging with the OSLog framework in Swift

Expand your learning with my books