Abhishek·April 30, 2026·9 min read

Deploy an Exa deep research agent on InstaVM with vault-backed API keys

InstaVMExaOpenAI Agents SDKDeep ResearchFastAPISSEVaultTutorial

A deep-research agent that uses Exa for web search and the OpenAI Agents SDK for tool calling, deployed on InstaVM so the OpenAI and Exa keys never enter the process environment. The app sends placeholder strings (OPENAI_KEY, EXA_KEY) on the wire; the InstaVM egress proxy substitutes the real credentials only on traffic to api.openai.com and api.exa.ai. Any other outbound host fails at the network layer.

Full code: instavm/cookbooks/deep-research-exa. For a full reference of every instavm.yaml field used here, see Deploy anything on InstaVM with `instavm.yaml`.

You'll need:

an InstaVM API key
an OpenAI API key
an Exa API key
Python 3.11 or newer

You'll end up with a FastAPI service that streams agent tool-call events over SSE, two Exa tools wired into the OpenAI Agents SDK, and an instavm.yaml that locks egress to two upstream hosts.

Quick start. If you just want the running app:

git clone https://github.com/instavm/cookbooks.git
cd cookbooks/deep-research-exa
instavm deploy .

When the deploy prompt asks for a vault, bind api.openai.com to your OpenAI key and api.exa.ai to your Exa key.

Step 1: create the project layout

The project is four files at the root:

deep-research-exa/
  app.py
  instavm.yaml
  requirements.txt
  README.md

requirements.txt:

openai-agents>=0.2.7
fastapi==0.128.8
uvicorn[standard]==0.39.0
httpx>=0.27,<1.0
pydantic>=2,<3

We'll fill in app.py over Steps 3-5, then deploy in Step 6.

Add instavm.yaml:

schema_version: 2
kind: service
slug: deep-research-exa
title: Deep Research (OpenAI Agents + Exa)
version: "0.1.0"
summary: A deep-research agent using Exa and the OpenAI Agents SDK.
category: agents
runtime: python-fastapi
deploy:
  kind: upload_and_run
source:
  include:
    - app.py
    - requirements.txt
    - instavm.yaml
    - README.md
  exclude: []
  setup_command: python -m pip install --no-cache-dir -r requirements.txt
vm:
  memory_mb: 2048
  vcpu_count: 2
  timeout_seconds: 86400
app:
  port: 8000
  healthcheck_path: /health
  readiness_timeout_seconds: 240
  share_public_default: true
run:
  workdir: .
  start_command: python -m uvicorn app:app --host 0.0.0.0 --port 8000
vault:
  required: true
  hosts:
    - api.openai.com
    - api.exa.ai
egress:
  mode: allowlist
  include_vault_hosts: true
  allowed_domains:
    - api.openai.com
    - api.exa.ai
  allowed_cidrs: []
  allow_package_managers: true
secrets: []
post_deploy_notes:
  - Open the share URL and ask a research question.
  - Check /health for vault_mode before sending traffic.

Three fields are doing the work here. vault.required: true makes instavm deploy refuse to run unless an org vault binds both upstream hosts. egress.mode: allowlist denies outbound hosts by default. egress.allowed_domains and egress.include_vault_hosts keep the effective allowlist to the two API hosts the app needs.

Step 2: store OpenAI and Exa keys in an InstaVM vault

Once the vault is bound, the agent runs with placeholder strings in its environment, and the proxy substitutes them per host:

OPENAI_API_KEY is set to the literal string OPENAI_KEY
Exa requests send x-api-key: EXA_KEY
outbound HTTPS to api.openai.com and api.exa.ai passes through the vault proxy
the proxy replaces placeholders with the real keys for those hosts only

Set up an org vault once:

instavm vault create deep-research-org
# returns: <vault-id>
instavm vault secret set <vault-id> OPENAI_KEY --value-file /path/to/openai.key
instavm vault secret set <vault-id> EXA_KEY    --value-file /path/to/exa.key
instavm vault service add <vault-id> --template openai --credential OPENAI_KEY
instavm vault service add <vault-id> --host api.exa.ai --auth-type api-key --header x-api-key --credential EXA_KEY

The OpenAI binding uses the built-in openai template (Authorization: Bearer ). The Exa binding uses the api-key flavor with a custom x-api-key header. If you'd rather not run these by hand, instavm deploy . from the cookbook directory walks both bindings interactively because vault.required triggers the setup prompt automatically.

Confirm with:

instavm vault service list <vault-id>

You should see both hosts bound and Enabled: yes.

Step 3: add Exa search and content tools

The agent gets two Exa tools: one for ranked search, one for fetching readable page contents. Both send placeholder strings on the wire. The proxy swaps them at egress.

import contextvars
import os

import httpx
from agents import function_tool

EXA_PLACEHOLDER = os.environ.get("EXA_PLACEHOLDER", "EXA_KEY")
EXA_BASE = "https://api.exa.ai"

_request_client: contextvars.ContextVar[httpx.AsyncClient | None] = contextvars.ContextVar(
    "request_client", default=None
)

def _client() -> httpx.AsyncClient:
    client = _request_client.get()
    if client is None:
        raise RuntimeError("Exa client not initialized for this request")
    return client

async def _exa_post(client: httpx.AsyncClient, path: str, payload: dict):
    headers = {
        "x-api-key": EXA_PLACEHOLDER,
        "content-type": "application/json",
        "accept": "application/json",
    }
    resp = await client.post(f"{EXA_BASE}{path}", headers=headers, json=payload)
    resp.raise_for_status()
    return resp.json()
@function_tool
async def exa_search(query: str, max_results: int = 6) -> list[dict]:
    """Search the public web with Exa and return ranked results."""
    data = await _exa_post(_client(), "/search", {
        "query": query.strip()[:500],
        "numResults": max(1, min(max_results, 10)),
        "type": "auto",
        "contents": {"text": {"maxCharacters": 600}},
    })
    return [
        {"title": r.get("title"), "url": r.get("url"), "snippet": r.get("text", "")[:600]}
        for r in data.get("results", [])
    ]
@function_tool
async def exa_get_contents(url: str) -> dict:
    """Fetch readable contents for a URL via Exa."""
    data = await _exa_post(_client(), "/contents", {
        "urls": [url],
        "text": {"maxCharacters": 10000},
    })
    item = (data.get("results") or [{}])[0]
    return {"url": url, "text": item.get("text", "")}

Two things are worth noting in the snippet above. Exa's REST API uses camelCase parameter names (numResults, maxCharacters), and the contents.text.maxCharacters field on /search returns inline page text alongside the results. The agent only calls exa_get_contents when it wants the full readable text of a result it has already seen.

In the cookbook, request setup stores a per-request httpx.AsyncClient in the ContextVar, which gives every POST /api/report an isolated connection pool and its own search budget. The full version is in app.py.

Step 4: wire the OpenAI Agents SDK agent

The agent itself is one SDK constructor call:

from agents import Agent, Runner
researcher = Agent(
    name="Researcher",
    model=os.environ.get("OPENAI_MODEL", "gpt-5.5"),
    tools=[exa_search, exa_get_contents],
    instructions=(
        "You are a research agent. Use exa_search to find sources, "
        "exa_get_contents to read them, then write a markdown briefing "
        "with sections TL;DR, Key Findings, Risks and Counterpoints, "
        "Open Questions, Sources. Cite every claim with a URL. "
        "Do not fabricate citations."
    ),
)
async def run_research(query: str):
    result = await Runner.run(
        researcher,
        query,
        max_turns=int(os.environ.get("RESEARCH_MAX_TURNS", "12")),
    )
    return result.final_output

The SDK handles tool routing and the turn loop. Caps live in two places: RESEARCH_MAX_TURNS for total agent turns, and a per-request search cap that emits a skipped event when hit.

Step 5: stream agent progress with SSE

Streaming over SSE lets the browser render tool-call events as the agent works, instead of a 30-second spinner. The SDK's Runner.run_streamed yields events for tool start, tool end, and the final message. We push each as an SSE frame:

MAX_AGENT_TURNS = int(os.environ.get("RESEARCH_MAX_TURNS", "12"))
async def stream(query: str):
    state = RequestState()
    token = _request_state.set(state)
    try:
        run = Runner.run_streamed(researcher, query, max_turns=MAX_AGENT_TURNS)
        async for event in run.stream_events():
            yield format_sse(event_to_dict(event))
        yield format_sse({"event": "report", "text": run.final_output})
    finally:
        _request_state.reset(token)

One catch worth knowing about: Starlette's BaseHTTPMiddleware buffers the full response body before forwarding it, which silently breaks SSE. The client only sees data after the agent has already finished. The fix is to write a plain ASGI middleware class and mutate http.response.start headers in place. The version in app.py uses that pattern for the security headers it adds to every response.

Step 6: deploy the Exa agent on InstaVM

From the cookbook directory:

instavm deploy .

When the vault is already set up, the deploy returns:

Deployed
  App       deep-research-exa
  VM        
  URL       https://.instavm.site

Hit /health to confirm the vault is wired up. It should return vault_mode: true. Then open the URL, type a research question, and the SSE stream will show each tool call as the agent makes it, ending in a markdown briefing with citations.

Costs and caps. Every InstaVM account starts with $50 of free compute. The deep-research agent defaults to 10 searches and 8 page fetches per request, so a single user query runs to single-digit cents in Exa fees plus single-digit cents in tokens.

The deploy uses share_public_default: true, which means anyone with the URL can run the agent against your vault credentials. For anything beyond a demo, add auth or a per-IP rate limit before sharing.

Security model: why the API keys never leak

Three layers of containment, none of which depend on the application code being correct:

The model never sees the real key. OPENAI_API_KEY inside the VM holds the literal string OPENAI_KEY. If the agent dumped its environment, that's all an attacker would get.
The VM cannot reach the open internet. egress.allowed_domains plus vault.hosts form the full allowlist, and the only two entries are api.openai.com and api.exa.ai. An agent told to POST your environment to an attacker URL fails at DNS resolution.
Vault bindings are per host. The OpenAI key substitutes only for traffic to api.openai.com, the Exa key only for api.exa.ai. A prompt injection that tells the agent to send a token header to a different URL gets a placeholder, not the real secret.

Deploy vs sandbox for Exa research agents

Use instavm deploy for this kind of agent because it only calls HTTPS APIs and returns text. Reach for the InstaVM sandbox provider for the OpenAI Agents SDK when the agent also needs Shell, Filesystem, apply_patch, a PTY, or a per-request workspace.

For agents that need both web research and code execution, run code, files, and shell inside InstaVMSandboxClient, and call Exa from inside that sandbox via httpx so the same vault rewriting still applies on egress. The moment the agent needs to write a Pandas script and run it, the sandbox is the right environment.

Try the cookbook

Full cookbook: instavm/cookbooks/deep-research-exa. Clone it, set up a vault as above, and run instavm deploy . from inside the directory. The deploy returns a public URL once the health check passes.

Forking it for a different search backend (Tavily, Perplexity, Linkup, your own corpus) is a two-file edit: the @function_tool wrappers in app.py, and the vault.hosts plus egress.allowed_domains entries in instavm.yaml. Everything else carries over.

For the full reference of every instavm.yaml field, see Deploy anything on InstaVM with `instavm.yaml`.

Get free execution credits

Run your AI agents in secure, isolated microVMs. $50 in free credits to start.

Get started free