Grounding Declarative Agents with Web Search

You’ve built a declarative agent grounded in SharePoint documents and Teams conversations, then someone asks about company culture, and the answer lives on your public careers page, not in any internal document. The WebSearch capability lets your agent pull content from external websites, giving it the power to blend internal and external data in a single response.

Why Web Search Matters

Not all organizational knowledge lives inside your tenant. Some of the most valuable content (careers pages, product documentation, investor relations, press releases) is hosted on the open web.

Without web access, your agent hits a wall on public-facing questions. The WebSearch capability bridges that gap.

The WebSearch Capability

The simplest configuration is an unscoped WebSearch capability. This gives your agent access to the entire web:

{
  "name": "WebSearch"
}

That’s it. Your agent can now search the open web and pull content into its responses.

⚠️ Warning

While unscoped web search is easy to configure, it significantly limits the quality and predictability of your results. The agent may surface content from unverified sources, return irrelevant information from random blogs or forums, and produce answers that are impossible to audit. For production agents, always scope your web search to trusted sites.

Why You Should Scope WebSearch

Scoping web search to specific sites gives you four key advantages:

Relevance control. When your agent can search the entire internet, it might find a random blog post about insurance instead of your company’s information. Scoping ensures responses come from sources you trust.

Response quality. A scoped agent produces more focused, authoritative answers: “According to zavainsurance.com…” versus “According to some forum post from 2019…”

Predictability. Users expect consistent answers. An agent that sometimes pulls from your careers page and sometimes from a competitor’s site is a trust killer.

Governance. In enterprise scenarios, you need to control where your agent gets its information. Scoped search gives you an auditable list of approved sources.

Scoping to Specific Sites

Add a sites array to restrict the agent to specific URLs:

{
  "name": "WebSearch",
  "sites": [
    {
      "url": "https://careers.zavainsurance.com"
    }
  ]
}

When you add a URL to the sites array, the agent treats it as a root, accessing that page and up to 2 sub-levels beneath it. Scoping to https://careers.zavainsurance.com means the agent can reach:

https://careers.zavainsurance.com/life-at-zava (1 level deep)
https://careers.zavainsurance.com/life-at-zava/benefits (2 levels deep)
https://careers.zavainsurance.com/open-positions (1 level deep)

But it cannot reach pages more than 2 levels below your root URL.

📝 Note

The sites array supports a maximum of 4 URLs. Choose your sites carefully: pick the most specific URL that covers your use case, and start narrow. You can always expand later.

Add multiple sites when your content spans different domains:

{
  "name": "WebSearch",
  "sites": [
    {
      "url": "https://careers.zavainsurance.com"
    },
    {
      "url": "https://www.zavainsurance.com/about"
    },
    {
      "url": "https://blog.zavainsurance.com"
    }
  ]
}

The Zava Insurance Example

Zava Insurance has a public careers page at https://careers.zavainsurance.com/life-at-zava featuring employee stories, office culture, community involvement, and sustainability initiatives. We want our HR Buddy agent to answer culture questions using this page alongside internal sources.

Here’s how web search fits with other knowledge sources in the capabilities array:

{
  "capabilities": [
    {
      "name": "OneDriveAndSharePoint",
      "items_by_url": [
        {
          "url": "https://zavainsurance.sharepoint.com/sites/HR/Shared Documents/Onboarding"
        }
      ]
    },
    {
      "name": "TeamsMessages",
      "urls": [
        {
          "url": "https://teams.microsoft.com/l/channel/19%[email protected]/New%20Hires%202026?groupId=..."
        }
      ]
    },
    {
      "name": "WebSearch", 
      "sites": [ 
        { 
          "url": "https://careers.zavainsurance.com/life-at-zava"
        } 
      ] 
    }
  ]
}

The Power of Internal and External Data

With web search added to your capabilities, your declarative agent can pull from both internal and external data in a single response. Imagine someone asks the HR Buddy:

HR Onboarding Buddy

What’s the culture like at Zava Insurance?

According to the Zava Insurance careers page, the culture is built around five core values: Integrity, Respect, Innovation, Accountability, and Inclusion. Employees highlight the collaborative hybrid work environment, strong ERG communities, and the annual hackathon in September. The company also has active sustainability programs aiming for carbon neutrality by 2030.

The agent cites the website, the user gets an accurate answer, and the agent never wandered off to Reddit or Wikipedia. No RAG pipeline. No embeddings database. No custom retrieval infrastructure. You point the agent at the right sources using URLs, and the M365 Copilot platform handles retrieval, chunking, and synthesis automatically.

The Value You Just Unlocked

A single capability declaration gave us:

External knowledge access: The agent can pull from public websites, bridging the gap between internal docs and public-facing content
Scoped precision: Up to 4 curated URLs with 2 sub-levels each, giving you control over exactly which external content the agent can access
Blended responses: External web content synthesized with internal context into a single, cited answer
Source transparency: Every response cites its sources, so users can verify information by clicking through to the original web page
Zero infrastructure: No web scraping pipeline, no content ingestion service, no custom crawlers to build or maintain
Governance-ready: An auditable, explicit list of approved external sources rather than unconstrained internet access

The HR Buddy now answers questions that span internal policies, peer conversations, and public-facing content, all from a few lines of JSON configuration.