Pro-Code for Copilot Agents: Beyond the Low-Code Ceiling

I recently sat down with The AI Frontier Playbook podcast to talk about something I keep coming back to in customer conversations: why professional developers need real engineering tooling for building Microsoft 365 Copilot agents. Here is the complete picture, from when low-code hits its ceiling to what pro-code tooling actually looks like in practice. If declarative agents are new to you, that post covers the foundational concepts before diving in here.

Copilot Agents Are More Than Q&A Bots

When people hear “Copilot agent,” they often picture a glorified FAQ bot: ask a question, search some documents, get an answer. That’s the starting point, not the destination.

Modern declarative agents act on business systems. They pull context from Outlook conversations, surface data from SharePoint, query external APIs, trigger workflows, and orchestrate across multiple services: all within the security boundary of Microsoft 365. A declarative agent at a company like Zava Insurance doesn’t just tell a new hire about the benefits package. It checks their onboarding status via an API, books first-week meetings, applies different guardrails for contractors versus full-time employees, and escalates to HR when something falls outside its scope.

That’s not a chatbot. That’s a production system with real consequences when it breaks.

Low-Code Is Great… Until It Isn’t

I want to be clear: I’m a fan of low-code. Copilot Studio and Agent Builder are fantastic for prototyping, quick automations, and scenarios where a business user owns the whole lifecycle. If your HR team wants to build a simple FAQ agent over a SharePoint site, low-code is the right call. Ship it in an afternoon.

But requirements grow. Always.

That simple HR FAQ agent at Zava Insurance? Six months later, it needs to:

Call an onboarding API to check each employee’s enrollment status
Enforce different behavior for contractors vs. full-time employees
Handle sensitive compensation questions with strict guardrails
Deploy across three regional tenants with different configurations
Pass security reviews with documented test coverage

This is where low-code platforms hit a ceiling. Not because they’re bad tools: because they were designed for a different scale of problem. When your agent becomes a production system that multiple teams depend on, you need the same engineering rigor you’d apply to any other production software.

The Decision Framework: When to Go Pro-Code

Here’s how I think about the transition. If you answer “yes” to two or more of these, pro-code tooling should be on the table:

Question	Why It Matters
Do you need source control and code review?	Agent instructions and manifests change behavior dramatically. A one-word edit can alter how the agent handles sensitive topics. You need PRs, diffs, and approvals.
Do you need automated testing and CI/CD?	Manual deployment works for prototypes. Production agents need repeatable, automated pipelines that catch regressions.
Are you integrating with multiple APIs?	Each API plugin adds complexity, auth flows, error handling, rate limits. That’s engineering work.
Will the agent deploy across tenants or regions?	Multi-tenant distribution requires environment-specific configs, staged rollouts, and parameterized deployments.
Do you have custom security or compliance requirements?	If your compliance team needs audit trails, test reports, and documented permission scopes, you need a codebase they can review.

💡 Tip

This isn’t binary. Many teams start in Copilot Studio, validate the concept with stakeholders, and then move to pro-code when requirements justify it. That’s a healthy pattern: not a failure of low-code.

Agents Deserve Engineering Discipline

Here’s the point I kept coming back to in the podcast discussion: agents are software. They should be versioned, tested, observable, and rollbackable: just like any other production system.

What does that look like in practice? A pro-code agent project using the Microsoft 365 Agents Toolkit looks like this:

zava-hr-agent/
├── appPackage/
│   ├── manifest.json          # App manifest: source controlled
│   ├── declarativeAgent.json  # Agent definition: reviewed in PRs
│   └── instructions.txt       # Agent behavior: tested before merge
├── apiPlugins/
│   ├── onboarding-api.json    # API plugin definitions
│   └── helpdesk-api.json
├── .github/
│   └── workflows/
│       └── deploy.yml         # CI/CD pipeline
└── m365agents.yml             # Agents Toolkit deployment config

Every file in this project is reviewable, diffable, and trackable. When the agent starts giving wrong answers about PTO policy, you can git log the instructions file and find exactly what changed. When a broken API plugin needs to be rolled back, it’s a git revert: not a manual undo in a UI.

This matters more than most teams realize upfront. I’ve seen organizations ship an unreviewed guardrail change that exposed sensitive data to the wrong audience. With pro-code tooling, that change would have gone through a pull request, been reviewed by a second pair of eyes, and tested before production.

📝 Note

Your agent’s instructions file is arguably the most important artifact in the project. A single sentence change can fundamentally alter behavior. Treat instruction changes with the same rigor as API contract changes: because that’s what they are.

The Agents Toolkit CLI makes this workflow practical. You get local development, environment parameterization, and deployment automation out of the box: no custom scripts, no manual packaging. If you’ve scaffolded a declarative agent and set up your conversation starters before, you already know how fast the inner loop is. Pro-code doesn’t mean slow: it means disciplined.

The Two Extension Vectors: Knowledge and Actions

A declarative agent extends Copilot along two orthogonal axes, and understanding both is critical to designing production agents correctly.

The first axis is knowledge: what your agent knows and can reference. This is grounding. A grounded agent doesn’t rely only on the LLM’s training data. It can search a specific SharePoint site, pull context from your organization’s Teams conversations, retrieve data from shared mailboxes, query a Copilot connector over a line-of-business system, or fetch live information from the web. Grounding is how your agent goes from generic to genuinely useful for your specific business context. Embedded knowledge and people intelligence round out the full grounding picture.

The second axis is actions: what your agent can do. This is where API plugins come in. An API plugin connects your agent to external systems through a structured OpenAPI description. The agent can read and write: it calls the onboarding API to check enrollment status, submits a ticket to the helpdesk system, or updates a record in your CRM. Actions are what turn a knowledge assistant into a production workflow participant.

For simple agents, one axis is often enough. Your benefits FAQ agent only needs knowledge (SharePoint grounding over your HR documentation site). Your expense report assistant mainly needs actions (an API plugin to read and submit expenses).

Production agents typically need both. The Zava Insurance onboarding agent needs to know about the company’s benefits documentation (knowledge from SharePoint), it needs to check each employee’s enrollment status (API action against the onboarding system), and it needs to behave differently based on whether that employee is a contractor or a full-time hire (instructions that combine context from both). That combination is an engineering problem: multiple knowledge sources, multiple API plugins, multiple environments, different configurations per region.

📝 Note

Knowledge sources are defined in your declarativeAgent.json under the capabilities field. API plugins each get their own OpenAPI description file in the apiPlugins/ directory. Both are source-controlled, both are reviewable, and both can be changed independently without breaking the other. Beyond grounding and API plugins, declarative agents also support capabilities like code interpretation, image generation, and tool integrations via MCP servers, all defined in the same source-controlled project.

What the Inner Development Loop Actually Looks Like

One of the points I made in the podcast discussion that surprised people: pro-code is not slower than low-code for experienced developers. For complex agents, it’s often faster. Especially once the initial scaffold is in place.

Here’s what the development loop looks like with the Microsoft 365 Agents Toolkit in Visual Studio Code:

You make a change to instructions.txt or a plugin definition
You press F5
VS Code provisions and sideloads the app package to your development tenant
Copilot Chat opens in your browser with your agent ready to test, visible only to you

That last point matters: the sideloaded agent is scoped to your account only. Nobody else in your organization sees it. You’re testing against the real Copilot Chat with your real tenant data, but the agent is not deployed to anyone. This is the actual development experience: real Copilot, real grounding, real API calls, all without touching production.

The environment variable system in the Agents Toolkit handles the separation between dev, staging, and production cleanly. Your m365agents.yml references placeholders like ${{HELPDESK_API_BASE_URL}} that resolve differently per environment. No hardcoded URLs. No “oops, I deployed to prod with dev credentials” incidents.

This inner loop combined with source control for every change means the actual iteration cycle for a pro-code declarative agent is competitive with Copilot Studio for developers who are comfortable in VS Code. You give up the drag-and-drop surface; you gain traceability, repeatability, and a real debugging experience on live infrastructure.

Instructions Are a Contract, Not Just Configuration

I want to go deeper on the instructions file because this is the piece that teams underestimate most, until it causes a serious problem.

Your instructions.txt is the behavioral specification of your agent. Every claim your agent makes, every decision it routes to a human, every topic it refuses to discuss: all of that is governed by what’s in this file. It’s not a settings panel. It’s not a configuration value you adjust and move on. It’s a contract with your users about what the agent will and will not do.

That has real consequences for how you manage changes to it.

When someone proposes a change to the instructions file, the review should be as thorough as a PR review for an API contract change. The reviewer needs to ask: what behavior does this enable? What behavior does this restrict? Are there edge cases where this new instruction creates ambiguous or conflicting guidance? Has this been tested against representative prompts?

In practice, that means your PR description for an instruction change should include:

What behavior change this produces and why it’s needed
What test prompts you ran against the new instructions and what the outputs were
Which stakeholder approved the behavioral change (security, legal, HR, whoever owns that topic area)

For the Zava Insurance onboarding agent, the instructions file encodes things like which questions are in scope (benefits, IT setup, onboarding checklists) and which get escalated (payroll disputes, terminations, compensation questions). A one-line addition that accidentally removed the “escalate compensation questions to HR” guardrail would be a serious compliance incident. In a low-code system, that change might be a single checkbox toggle with no review trail. In a pro-code system, it’s a diff in a pull request with a reviewer, a description, and an approval before it touches production.

⚠️ Warning

Never treat instruction changes as trivial configuration updates. They change agent behavior in ways that can be subtle and hard to detect without careful testing. Apply the same review standard you’d use for any other user-facing behavior change in production software.

Two related capabilities that also live in your agent’s behavioral configuration are behavior overrides (which let you turn Copilot’s built-in behaviors on or off for your specific agent) and disclaimers (fixed messages that appear alongside agent responses in regulated or sensitive scenarios). Both deserve the same review rigor as any other instruction change.

For a complete guide to writing and structuring effective instructions, see Crafting Effective Instructions for Declarative Agents.

Don’t Build Your Own Orchestration Layer

One of the questions I get a lot from developers who are moving to pro-code is: “Should I build my own orchestration layer on top?” And every time, my answer is the same: no. Don’t do it.

The Microsoft 365 Copilot platform already handles the hard parts. LLM routing, context window management, multi-turn conversation state, grounding against your SharePoint sites and emails, authentication with Microsoft identity: all of it is taken care of. When developers build their own orchestration on top, they’re not adding value. They’re adding maintenance surface.

I understand the instinct. Developers like to control things. But with declarative agents, the right move is to push as much of that responsibility to the platform as possible and focus your energy on what actually differentiates your agent: the instructions, the API integrations, the guardrails, and the deployment pipeline.

There are legitimate cases where custom orchestration makes sense, usually when you need stateful long-running processes or complex approval workflows that the platform doesn’t natively support. But those are the exception. For the vast majority of enterprise declarative agents, the platform’s built-in orchestration is both sufficient and significantly more reliable than anything you’d build yourself.

💡 Tip

Start with what the platform provides. Add custom logic only when you’ve confirmed the platform genuinely can’t handle your requirement. In most cases, a well-crafted instruction file and a good API plugin will get you further than a custom orchestration layer ever will.

For the specific pattern of connecting multiple agents together, the delegation pattern for connected agents gives you a structured approach when multi-agent coordination is genuinely required.

The Migration Path: When You’ve Outgrown Copilot Studio

If your team already has an agent running in Copilot Studio and is hitting the ceiling, this is the question I get most often: “Can we export what we built and move it to pro-code?”

The honest answer is: partially. Copilot Studio gives you an export option that packages your agent definition, but it doesn’t map cleanly to the Agents Toolkit project structure. You’ll get the general shape of your agent (the topics, the knowledge source references, some of the instructions) but you’ll need to reconstruct the deployment pipeline, the environment configuration, and the plugin definitions from scratch.

My recommendation is to not treat the migration as a direct port. Treat it as a guided rebuild. Use the Copilot Studio agent as the behavioral specification: take the agent definition and use it as the source of truth to write a proper instructions.txt, a proper declarativeAgent.json, and proper OpenAPI definitions for each integration.

The sequence I’ve seen work best:

Document the current agent’s behavior thoroughly (test prompts, expected responses, known edge cases) before touching anything
Scaffold a new Agents Toolkit project with atk new
Translate the Copilot Studio instructions into a structured instructions.txt using the 5-part framework
Re-implement API connections as proper OpenAPI-described plugins rather than custom connectors or Power Automate flows
Run both agents in parallel for a validation period before cutover

The parallel validation phase is important. Users will notice differences in behavior between the old and new agent. That’s expected and actually desirable: the pro-code version should be more precise and consistent. But you want to catch regressions before they hit production users, not after.

📝 Note

If you’re starting a net-new agent and already know it’s going to need CI/CD, multi-tenant deployment, or custom API integrations, skip Copilot Studio entirely. Start with the Agents Toolkit from day one. The rearchitecture cost later is never worth the faster start.

Start Right to Avoid Costly Rearchitecture

One of the most painful patterns I see in customer engagements is what I call the “rearchitecture trap.” A team starts with low-code, builds something that works, adds more features on top, and eventually arrives at a point where the architecture is fundamentally wrong for their requirements. At that point, they have two options: live with it, or rebuild everything from scratch.

That’s expensive. Not just in engineering time, but in stakeholder trust. You told them the agent was done. Now you’re telling them it needs to be rebuilt.

The way to avoid this is to assess your requirements honestly at the start. Ask the hard questions early:

Will this agent need to be maintained by a development team, or just the business unit that built it?
Are there compliance requirements that will demand audit trails and documented test coverage?
Will this agent need to connect to APIs that require custom authentication flows?
Is there any chance this will need to be deployed to multiple tenants?

If the answer to any of these is “probably yes,” start with pro-code tooling from day one. The incremental cost of setting up a proper Agents Toolkit project instead of using Agent Builder is measured in hours, not weeks. The cost of rearchitecting a production agent that’s already embedded in business workflows is measured in months.

The Microsoft 365 Agents Toolkit for Visual Studio Code gives you a production-ready scaffold from the first atk new command. Use it early, even for agents you think are “just” prototypes. Prototypes have a way of becoming production systems faster than anyone plans for.

The Productivity Case for Pro-Code

I want to close with the argument I find most underappreciated: pro-code is not the slower option. For complex agents, it’s often the faster one.

Consider what you gain with the Agents Toolkit that you don’t have in Copilot Studio.

Code as the source of truth. In a pro-code project, every aspect of your agent’s behavior lives in files under version control. The instructions.txt, declarativeAgent.json, manifest.json, and OpenAPI plugin descriptions are the complete, authoritative specification of what the agent does. There is no hidden state in a UI, no settings stored in a cloud tenant that nobody can diff or review. What is in source control is what runs in production. That means every change is visible, every change is reviewable, and every change is reversible.

Environment parameterization out of the box. Every environment (dev, staging, prod) gets its own .env.{environment} file. The lifecycle file references those variables with ${{VARIABLE_NAME}} syntax. You’re not copying and pasting agent definitions between environments. You’re running the same definition with different parameters. This is how enterprise software deployment works, and your agents deserve the same treatment.

A pipeline that ships itself. Once your GitHub Actions workflow is in place (see CI/CD Pipelines for Declarative Agents for the full implementation), your deployment process is: merge PR, pipeline runs, agent updates in production. No manual steps. No “who has the credentials for the staging tenant.” The pipeline has them, managed as secrets, and they’re never on anyone’s laptop. For the governance side of deployment, Publishing and Governing Declarative Agents covers admin controls, user targeting, and the organizational approval workflow.

A debugging experience that actually shows you what’s happening. When an agent gives a wrong answer in production and you need to diagnose why, git log on the instructions file shows you exactly what changed and when. Combined with your CI/CD pipeline logs, you get a full audit trail from the code change to the deployment to the observed behavior change. The debugging surface for complex agent behaviors in Copilot Studio is considerably more opaque: you see the outcome but not the chain that produced it. For a systematic investigation workflow, Debugging Declarative Agent Failures walks through diagnosing the most common agent misbehaviors in production.

The developers I see get frustrated with pro-code are almost always frustrated with the initial setup cost, not the ongoing development experience. That initial scaffold takes a few hours to get right. After that, the inner loop is fast, the deployment is automated, and the whole thing is auditable.

That upfront investment pays back the first time you need to roll back a bad instruction change in production. It pays back dramatically if you ever need to explain your agent’s behavior to a security team or a compliance auditor.

The Value, in Plain Terms

I talk to a lot of enterprise teams who are trying to figure out whether the overhead of pro-code tooling is worth it. Here is the simplest way I know to frame it.

With low-code, you trade engineering discipline for speed of initial delivery. That’s a fine trade when the agent is genuinely small and stable. It becomes a bad trade when the agent grows in complexity, when more than one person needs to work on it, when it handles data that has compliance implications, or when it needs to run reliably across multiple environments.

With pro-code tooling, you make a different trade: a few extra hours of setup in exchange for:

Every change is auditable. Instruction edits, plugin updates, manifest changes: all of them are diffs in pull requests with reviewers and approvals. Nothing reaches production without a record of who changed what and why.
Every deployment is repeatable. The same atk provision and atk deploy commands run locally, in CI, and in production. No “it worked on my machine.”
Every regression is catchable. Because you have version history, you can identify exactly which commit changed the behavior and revert it in minutes, not days.
The organization can grow the agent. A pro-code project can be handed off between teams, reviewed by security, extended by new contributors, and deployed to new tenants. A Copilot Studio agent that only the original builder understands is a support liability.

The question is not whether pro-code tooling adds overhead. It does, at the start. The question is whether the complexity of your agent justifies the investment. And in my experience, for any agent that a team is planning to maintain, extend, and depend on in production, the answer is almost always yes.

Copilot Agents Are More Than Q&A Bots

Low-Code Is Great… Until It Isn’t

The Decision Framework: When to Go Pro-Code

Agents Deserve Engineering Discipline

The Two Extension Vectors: Knowledge and Actions

What the Inner Development Loop Actually Looks Like

Instructions Are a Contract, Not Just Configuration

Don’t Build Your Own Orchestration Layer

The Migration Path: When You’ve Outgrown Copilot Studio

Start Right to Avoid Costly Rearchitecture

The Productivity Case for Pro-Code

The Value, in Plain Terms

Resources