Microsoft Foundry Agent Runtime Architecture [2026]
Bottom Line
Microsoft Foundry Agent Service is becoming an operational runtime for enterprise AI agents. Its core value is managed state, tools, identity, tracing, and governance around agent workflows.
Key Takeaways
- ›Memory preview supports user profile, chat summary, and procedural memory stores.
- ›The Responses API exposes models and platform tools through a project-scoped endpoint.
- ›Tracing captures model calls, tool use, retries, latency, costs, inputs, and outputs.
- ›Standard setup can store files, conversations, and vector stores in your Azure resources.
Microsoft Foundry Agent Service is Microsoft’s attempt to turn AI agents from bespoke orchestration code into an Azure runtime primitive. As of June 19, 2026, the interesting parts are not just model access. They are managed memory, project-scoped tools, traceable execution, Entra-backed identity, and governance hooks that make agent behavior inspectable enough for enterprise production reviews and repeatable audit conversations.
- Memory preview stores user profile, chat summary, and procedural memories in managed memory stores.
- Responses API is the project-scoped entry point for models plus Foundry platform tools.
- Tracing captures inputs, outputs, tool usage, retries, latency, and cost signals.
- Standard setup can place files, threads, and vector stores in customer-owned Azure resources.
Architecture & Implementation
Bottom Line
Foundry Agent Service is best viewed as an agent runtime plus control plane. It reduces the amount of state, tool, identity, and observability plumbing each team must rebuild around an LLM API.
The platform has three operating modes. Prompt agents are authored in the Foundry portal, SDKs, or REST and run directly on Microsoft-managed infrastructure. Hosted agents let teams package code written with Microsoft Agent Framework, LangGraph, OpenAI Agents SDK, Anthropic Agent SDK, GitHub Copilot SDK, or custom orchestration into a container that Foundry runs behind a managed endpoint. Existing applications can also call the Responses API directly to use Foundry models and tools without moving the full agent process into Foundry.
The runtime shape is familiar to anyone who has built a production chatbot, but the ownership boundary changes. The agent combines a model, instructions, and tools, while the service handles agent lifecycle, conversations, tool calls, identity wiring, and operational telemetry. Microsoft’s overview describes the Responses API as the single entry point for agent types and a path to platform tools such as file search, code interpreter, memory, web search, and MCP servers.
Memory as a managed state layer
Memory is currently a preview feature, and that status matters. It is not just chat history pasted back into the context window. Microsoft describes Foundry memory as a managed long-term memory system that extracts meaningful information from conversations, consolidates it into durable knowledge, and retrieves relevant memories in later sessions.
- User profile memory stores durable preferences and personal context, such as language, product defaults, or accessibility needs.
- Chat summary memory stores distilled summaries of prior topics and threads for continuity.
- Procedural memory stores reusable routines and operating patterns inferred from previous interactions.
- Item-level CRUD lets developers create, read, update, list, and delete individual memory records.
- Store-level TTL gives teams a default retention control for newly created memory entries.
The important implementation detail is that memory introduces a second trust boundary. A bad prompt can produce a bad answer; corrupted memory can influence future answers. Microsoft’s guidance explicitly calls out prompt injection and memory corruption risks, so production designs should treat memory writes like privileged state mutation. For teams testing examples with synthetic customer data, a privacy workflow should include redaction before logs, traces, or fixtures are shared; TechBytes’ Data Masking Tool is useful for preparing non-production samples without leaking identifiers.
Tools, toolboxes, and authentication
Foundry tools split into built-in and custom capabilities. Built-ins include Web Search, Code Interpreter, File Search, and Function calling. Custom options include Model Context Protocol, Agent-to-Agent preview endpoints, and OpenAPI tools. Toolboxes, also marked preview, bundle multiple tools behind an MCP-compatible endpoint with centralized authentication, versioning, token refresh, and policy enforcement.
- Use File Search when the agent needs document-grounded retrieval over uploaded or curated files.
- Use Code Interpreter for sandboxed Python analysis, charts, calculations, and data inspection.
- Use OpenAPI tools when business systems already expose stable HTTP contracts.
- Use MCP when a tool server must be shared across frameworks or teams.
- Use a toolbox when many agents need the same curated tool set and governance policy.
request -> agent instructions
-> model reasoning
-> tool selection
-> authenticated tool execution
-> result inserted into response context
-> trace, metrics, and policy signals emitted
This is the part of the runtime that most changes day-two operations. In a hand-rolled agent, each team decides how to store tool credentials, filter arguments, retry failures, and capture outputs. In Foundry, the platform increasingly centralizes that work around project connections, Entra identity, RBAC, OAuth, managed identity, and tool catalog configuration.
Benchmarks & Metrics
Microsoft does not publish one universal benchmark that proves Foundry agents are faster or more accurate than every custom stack. The practical metrics are operational: latency per run, tool error rate, token consumption, retrieval quality, memory precision, and cost per completed task. Foundry’s observability model is built around evaluation, monitoring, and tracing rather than a single leaderboard number.
What to measure before production
- Task completion: percentage of realistic workflows completed without human repair.
- Tool call accuracy: whether the agent chose the right tool with valid arguments.
- Groundedness: whether responses are supported by retrieved files, search results, or enterprise knowledge.
- Latency: total run time plus per-step model, retrieval, and tool timings.
- Cost: model tokens, Code Interpreter sessions, vector storage, and any downstream tool charges.
- Memory quality: useful memories retained, stale memories deleted, and harmful memories blocked.
The current observability docs say Microsoft Foundry integrates with Azure Monitor Application Insights and tracks operational metrics including token consumption, latency, error rates, and quality scores. Distributed tracing is built on OpenTelemetry standards and can expose LLM calls, tool invocations, agent decisions, and inter-service dependencies. Tracing is generally available for prompt agents, while hosted, workflow, and external agent tracing are still in preview.
Data placement and quota signals
The FAQ makes a sharp distinction between basic and standard setup. In basic setup, state is stored in secure Microsoft-managed resources. In standard setup, data is stored in customer Azure resources: Azure Storage for files and attachments, Azure Cosmos DB for threads and conversation history, and Azure AI Search for vector stores. Microsoft also states that data persists unless explicitly deleted and that standard setup supports customer-managed keys.
Memory has its own preview limits. Microsoft’s memory page lists 100 maximum scopes per memory store, 10,000 maximum memories per scope, and 1,000 requests per minute for both search and update memories. Those are not end-to-end application limits, but they are real design inputs for tenant isolation, personalization scope, and retry behavior.
Strategic Impact
Foundry’s strategic move is to make agent infrastructure look more like cloud infrastructure. Instead of every team building a private blend of prompt templates, vector stores, tool wrappers, audit logs, and identity hacks, Foundry turns those concerns into platform services. That does not remove architecture work. It shifts the hard questions from “how do we wire this together?” to “which runtime responsibilities should the platform own?”
- Platform teams get a clearer place to enforce identity, storage, network, logging, and policy defaults.
- Application teams can choose prompt agents for managed behavior or hosted agents when deterministic code paths matter.
- Security teams can audit tool access through Entra ID, RBAC, managed identity, and project-scoped controls.
- Compliance teams can reason about regional endpoints, customer-owned storage, deletion paths, and CMK support.
- Finance teams get better cost attribution through token, latency, session, and storage metrics.
The governance story is especially important because agents act with delegated authority. Microsoft’s Cloud Adoption Framework guidance says organizations need to identify what agents exist, determine who owns them, limit what they can access, observe what they do, and stop what they should not do. Foundry Agent Service fits that model by connecting runtime execution to an Azure control plane that enterprises already use for inventory, RBAC, monitoring, policy, and incident response.
Where Foundry is strong
- It gives Azure-native teams a managed path for stateful agents without abandoning existing identity and monitoring investments.
- It supports both low-code prompt agents and code-driven hosted agents, which avoids forcing one orchestration style.
- It treats tools as governed resources instead of informal functions buried inside application code.
- It exposes traces that are useful for debugging multi-step, nondeterministic workflows.
Where teams still own the risk
- They must define what memory should and should not retain for each business domain.
- They must test tool permissions against least-privilege and abuse scenarios.
- They must build evaluation datasets that reflect real failures, not only happy-path demos.
- They must decide when preview features are acceptable for regulated workflows.
Road Ahead
The next year of Foundry Agent Service will likely be shaped by three pressures: standardization, governance, and workload maturity. MCP and A2A-style integration make tool ecosystems more portable, but portability increases the need for central policy. Memory improves personalization, but it also creates persistent state that must be corrected, expired, and audited. Hosted agents give engineering teams code-level control, but they also bring container lifecycle, dependency, and release-management concerns back into the picture.
For most organizations, the pragmatic adoption path is incremental. Start with a narrow agent, use standard setup if data ownership or CMK matters, attach only the tools required for the workflow, enable tracing from day one, and treat memory as a feature that needs product policy rather than a convenience switch.
- Define the agent owner, purpose, data classes, and allowed tools before implementation.
- Choose prompt agents when managed orchestration is sufficient; choose hosted agents when control flow must live in code.
- Instrument traces, metrics, and evaluation gates before exposing the agent to real users.
- Separate stable knowledge grounding from user memory and document deletion behavior for both.
- Review preview dependencies, especially memory, toolbox, hosted tracing, workflow tracing, and A2A integrations.
The useful mental model is simple: Foundry Agent Service is not a magic reasoning layer. It is the runtime envelope around agent behavior. Its value grows when the agent needs persistence, tools, identity, observability, and governance at the same time. For engineering leaders, that makes Foundry less about replacing frameworks and more about deciding which operational responsibilities should become platform defaults.
Frequently Asked Questions
What is Microsoft Foundry Agent Service used for? +
How does memory work in Foundry Agent Service? +
What tools can Foundry agents call? +
Is Foundry Agent Service observable in production? +
Where does Foundry Agent Service store agent data? +
Get Engineering Deep-Dives in Your Inbox
Weekly breakdowns of architecture, security, and developer tooling — no fluff.
Related Deep-Dives
AgenticOps: The New Frontier of AI Observability
A practical look at governing and monitoring autonomous AI agents in production.
Cloud AIAzure Foundry and AI Search Add Agent Plumbing
How Foundry, AI Search, hosted agents, and private connectivity fit together.
Developer ReferenceOpenTelemetry GenAI Agent SemConv Cheat Sheet [2026]
A developer reference for tracing LLM calls, tools, tokens, and agent spans.