AI

Agentic Engineering

Infra

A Personal Agent that Scales to Zero: Hermes on Fly.io

By Cameron Ball

Spinning up a general-purpose personal assistant for daily life, engineering work, and beyond.

In this article, I delineate every step I took when creating my own personal agent. My goals and requirements for this project were to…

  • Keep costs as low as possible. Ideally, the same as or lower than ChatGPT/Claude’s $20/mo. subscription plan.
  • Be able to text my agent. More broadly, interact with my agent from any of my devices.
  • Store nightly data backups to prevent loss in the case that anything goes down or I lose access.
  • Have my agent know me, and communicate with me in a style that I prefer (generally, terse).
  • Have the agency to solve coding tasks at my request, and keep me in the PR review flow. More generally, be able to take actions on my behalf for simple tasks, namely web research and browsing, reading emails, etc.

Bonus points if I can…

  • Interact via voice.
  • Transfer all my prior data from ChatGPT (particularly chat session data).
Info

This article is a living document on creating a personal agent for myself. Some items are still in progress and will be added to this article as they are completed. Such topics include…

  • Memory provider setup (beyond Hermes’ default)
  • MCP and tool call integrations
  • Data backups
  • ChatGPT data migration
  • Having the agent send me generated files via Telegram

Agent Harness

I’ll be going with Hermes, as I heard from reviews that it works better than OpenClaw out of the box.

I tried OpenHuman but met a fatal UI bug on account setup that I couldn’t move past, so that was a non-starter.

pi seems really intriguing and well-built as well, but Hermes is a bit more batteries-included, which is what I’m looking for (to save time). I saw this YouTube video from pi’s creator; cool (and very smart) guy. I like the way he reasons about software.

Finally, Goose seems solid as well. There’s already many good competitors in this space; I simply like Hermes the most off the bat and wanted to dive in and see how it pans out.

Messaging Provider

Telegram seems to make the most sense as my primary means of interacting with Hermes. We can connect Hermes Messaging Gateway to a Telegram Bot which gives a great cross-platform UI, ChatGPT-style thread support, and even voice mode.

To get started here…

  1. Download the Telegram application on any platform and make an account.
  2. Create a Telegram bot by following step 1 of these instructions. Note your bot token for later.
  3. Send /start to @userinfobot to get your user ID.

As far as the chat model goes, using multi-session DM mode provides something similar to ChatGPT, where we control sessions via “topics”. Steps to implement this are…

  1. Open @BotFather → Bot Settings for the Hermes bot → Threads Settings → Turn on Threaded Mode.
  2. Type /topic in the root DM with the bot to enable it.
  3. Done. Any new message sent to the top-level Hermes bot will automatically start a new conversation. Play around with Telegram’s UI and you can get all conversations to show in the left nav bar. /topic help can help if necessary.

Hosting on a Cloud VM / VPS

I initially deployed Hermes on a Hetzner CX33 (eu-central (Helsinki, Finland) location, as it’s all that was available), but revamped the entire infra setup to be able to scale to zero to optimise costs. Deploying on Fly.io, I could reduce network latency considerably by running in a region close to me, while paying less than I was at Hetzner (no shame to Hetzner; they have great services). My Hetzner machine was a flat $10.09/mo., but I believe I can reduce to a fraction of that with my expected usage patterns for this project.

While still on Hetzner, I ran docker stats in my Hermes container to test resource usage to help me configure my Fly machine appropriately, seeing the following while hermes is idling:

CONTAINER ID   NAME      CPU %     MEM USAGE / LIMIT     MEM %     NET I/O           BLOCK I/O        PIDS
f724dcd38e9b   hermes    0.09%     222.4MiB / 7.565GiB   2.87%     27.4MB / 46.1MB   1.79MB / 289MB   21

After sending a simple message, memory stayed consistent but CPU jumped to ~3%, which is still real real low. While only crude measurements, they give me some frame of reference for a pricing estimate and necessary system requirements.

At the time of writing, Fly has six different U.S. regions. lax (Los Angeles) is right in the middle in terms of pricing, and also happens to be the closest to my location, so I’ll go with that. Realistically for my usage, region pricing differences will be negligible.

I’ll initially opt for 1 machine, 1 core, shared CPU, 2 GB RAM, in lax. I’ll add a 1 GB volume for now, and keep the default of 10 GB bandwidth. With this, I expect the price ceiling to look like:

ItemPrice
Compute$0.83
Memory$11.71
Volume$0.15
Total$12.70

To get started, I simply created a Fly account, brew install flyctl, and ran the following (following the Hermes instructions in Fly’s docs, modifying as I see fit):

# Create the app
fly apps create cameron-personal-agent

# Create a 1 GB volume for persistent storage for sessions, skills, etc.
# We can always extend a volume later, but volumes cannot be shrunk.
fly volumes create data -a cameron-personal-agent --region lax --size 1

# Home for our Fly config
mkdir -p ~/.../agent
cd ...  # ...into the prior-created directory

We’ll employ a Docker-based approach where Hermes itself is running in a container that is running on the cloud VM. Hermes has docs on exactly this, but since we’re running the container directly as a Fly machine, we’ll just pull the pre-built image in fly.toml to not have to manage a docker-compose.yml ourselves.

Then created my fly.toml:

app = "cameron-personal-agent"
primary_region = "lax"
machine_config = "machine_config.json"

[build]
  image = "nousresearch/hermes-agent:latest"

[[services]]
  internal_port = 8443
  protocol = "tcp"
  auto_stop_machines = "suspend"
  auto_start_machines = true

  [[services.ports]]
    handlers = ["tls", "http"]
    port = 443

[[mounts]]
  source = "data"
  destination = "/opt/data"  # Where all Hermes' config will live in the container
  scheduled_snapshots = false

[[vm]]
  memory = "2gb"
  cpus = 1

*Note that we disabled volume snapshots (scheduled_snapshots) for this Fly app. A data backup solution is still in progress and will be added to this article soon.

Info

Note that we enabled scale-to-zero via auto-suspend. Fly’s concept of suspending a machine enables it to wake faster than if it were entirely stopped. For auto-suspend, note that machines we create are capped at 2 GiB memory. Fly notes that 4 GB / 2 CPU is recommended when browser tools are active (which are, in our case), but we’ll try and make 2 GiB memory work for now.

We can see this in action in the logs:

19:20:19 App cameron-personal-agent has excess capacity, autosuspending machine 48e13eeb573198. 0 out of 1 machines left running (region=lax, process group=app)
19:23:20 Virtual machine has been suspended

Cold starts are quick enough where they shouldn’t negatively impact our user experience. Logs for starting the machine from a suspended state show:

Machine started in 4.331s

We also write the referenced machine_config.json:

{
  "containers": [
    {
      "name": "hermes",
      "image": "nousresearch/hermes-agent:latest",
      "cmd": ["gateway", "run", "--replace"],
      "secrets": [
        { "env_var": "TELEGRAM_WEBHOOK_SECRET" },
        { "env_var": "TELEGRAM_WEBHOOK_URL" }
      ]
    }
  ]
}

Lastly, to let our machine scale to zero and wake when a message comes in, we’ll enable Hermes’ Telegram webhook mode. For this, we add the following environment variables to our Fly app:

{
  echo "TELEGRAM_WEBHOOK_URL=https://cameron-personal-agent.fly.dev/telegram"
  echo "TELEGRAM_WEBHOOK_SECRET=$(openssl rand -hex 32)"
} | fly secrets import

We’re now ready to deploy the machine:

fly deploy -a cameron-personal-agent --ha=false

We specify --ha=false, as we don’t want to horizontally scale Hermes. The --ha flag’s help output states “Create spare machines that increases app availability (default true)“. Hermes is stateful, and two Hermes processes writing to the same volume could cause undefined behaviour.

Upgrading our Hermes version is as simple as re-deploying with the above command, since we always pull the latest image.

Setup Wizard Configuration

Start by sshing into our machine:

fly ssh console -a cameron-personal-agent

Once in the machine, run the initial setup wizard:

hermes setup

Proceed with the setup wizard’s steps as you wish. I answered prompts like the following:

  1. How to set up: Quick setup
  2. Model Provider: Nous Portal (Nous Research subscription)
    • While there is a free plan, there may or not be any free models available to demo with. I went with the free plan for initial setup and continued to the $20/mo. plan soon after.
  3. Terminal Backend: Keep current (local)
  4. Messaging Platforms: Set up now; Telegram
    • Fill in Hermes’ prompts with the info from our bot generated above.

We can now chat with Hermes :)

Model Provider

Since I’m using Hermes, I’ll start with the Nous Portal for a quick setup; $20/mo. subscription tier.

Activate with hermes model, and pick your default model (I chose stepfun/step-3.7-flash as it was temporarily free). Because I’m running with Nous Portal as my model provider, that takes care of several things for me, like allowing me to change/route to different models via /model in the chat (docs).

There’s a chance I’ll switch to OpenRouter in the future, but for quick startup, the Nous Portal subscription made the most sense for me.

Connectivity

An agent isn’t an agent if it can’t take actions on my behalf.

Because I am using and paying for Nous Portal as my model provider, that comes with the Nous Tool Gateway. That takes care of…

  • Web search & extraction (via Firecrawl)
  • Image generation
  • Text-to-speech
  • Browser automation (via Browser Use)

Hermes allows configuring these individually if desired, usually in the form of providing a backend for each type (docs: browser, web search).

I only really need web search, browser, and occasionally terminal access (docs for terminal backends) for my non-coding tasks. I’ll handle configuring the agent for coding tasks separately at a later time. Might be worth installing a local filesystem browser as well via MCP; we’ll see:

filesystem:
    command: "npx"
    args: ["-y", "@modelcontextprotocol/server-filesystem", "/home/user/projects"]

Other integrations I’ll set up down the road are…

  • Personal Gmail
  • GitHub, via GitHub MCP
  • Work
    • Linear: Read-only, via Linear MCP
    • Slack: Read-only, via Slack MCP
    • Gmail: Read-only
    • Potentially work GitHub org access with the right controls in place

MCP Setup

cd .../.hermes/hermes-agent
uv pip install -e ".[mcp]"

Add servers to config.yaml, e.g.:

mcp_servers:
  local_example:
    command: "example"
    args: ["-v"]
    env:
      EXAMPLE_VAR: "my_value"
  network_example:
    url: "https://mcp.example.com/mcp"  
    headers:  
      Authorization: "Bearer ***"

Reload MCP servers after adding any with /reload-mcp.

Final Cost Breakdown

ItemPrice / mo.
Cloud VM (Fly Machine)<$12.70
Model Provider (Nous Portal subscription)$20.00
Blob Storage (for data backup) (Cloudflare R2)$0.00
Total<$32.70

In reality, the price should trend toward $20; the monthly price would be $32.70 if my Fly Machine was on the entire month continually, but with scale-to-zero functionality, I’m only charged for when the machine is actually on.