Case study · Hospitality Published May 8, 2026 11 min read

Case study: 18 hours a week, automated with one agent

How Astral Mantra Labs cut 18 hours a week of manual operations for an anonymised mid-sized hospitality client — the workflow we automated, the architecture we shipped, the numbers in production, and what we'd do differently next time.

TL;DR

Client: mid-sized hospitality operator (anonymised), ~40 properties, Nepal + India.

Problem: three ops staff spent 6+ hours each, every week, manually triaging guest emails, drafting replies, syncing bookings between channels, and chasing missing payment confirmations.

What we shipped: a single workflow agent reading every inbound guest email, classifying it, drafting a reply grounded in the property knowledge base, syncing the channel manager and PMS, and escalating only the genuine edge cases. Built in 6 weeks. Live for 7 months.

Result: 18 hours/week saved across the ops team, 96% inbound email classification accuracy, 22% faster guest-reply time, 11.4× ROI in the first year.

The problem

The client runs a portfolio of ~40 boutique hospitality properties across Nepal and India. Mid-sized operator. Strong brand. Heavy email volume. Their ops team had three full-time staff spending the bulk of their week on what they called "the inbox":

Combined: roughly 18 hours of repetitive work per week, by their own log. The problem wasn't volume — they could keep up. The problem was that this work was eating hours that should have gone to actual guest experience and revenue management.

What we scoped in discovery

Discovery ran two weeks. We did three things:

  1. Mapped the actual process. Sat with the ops staff for two days, watched them handle real emails, took notes. Documented every tool, every decision, every edge case.
  2. Sampled and labelled inbound email. Pulled three months of email history (~14,000 messages) and classified them into 11 categories: booking inquiry, booking change, payment confirmation, payment dispute, complaint, special request, post-stay feedback, B2B partner, internal, spam, and "actually needs a human."
  3. Identified what the agent could and couldn't do. The first 8 categories are largely deterministic-with-judgement — perfect for an agent. The last 3 (genuine complaints, payment disputes, escalations) need a human, full stop.

We came back with a fixed-price discovery deliverable: written spec, system diagram, data model, integration list (Gmail, channel manager API, PMS API, payment gateway, Slack), model + infra decision, evaluation plan, and a fixed-price quote for a 6-week build.

What we shipped

A single workflow agent with the following architecture:

  1. Email ingest. Gmail webhook fires on every new inbound to the ops inbox. Message goes to the agent's queue.
  2. Property + booking resolution. The agent extracts the property, guest, and booking ID from the email — using a combination of structured matching (for emails with booking refs) and LLM extraction (for free-form messages).
  3. Classification. The agent classifies the email into one of the 11 categories. Confidence threshold of 0.85 — below that, the email goes to a human.
  4. Knowledge retrieval. For replies, the agent retrieves the relevant property's knowledge base (check-in time, amenities, local recommendations, policies) via RAG.
  5. Reply drafting. The agent drafts a reply in the brand voice, grounded in the retrieved knowledge. For booking changes, it also drafts the change action against the channel manager and PMS.
  6. Human checkpoint (first 4 weeks). Every reply was reviewed by a human before sending. We measured the override rate — how often the human edited the draft — and used that as the quality metric.
  7. Auto-send (after week 4). Once override rate dropped below 8% on Tier-1 categories, we cut over to auto-send for those categories. Edge cases still routed to humans.
  8. Slack escalation. Any email tagged as complaint, dispute, or escalation gets Slack-pinged to the right ops person, with the full thread and a suggested response already drafted.

What we used under the hood

The numbers after 7 months in production

What we'd do differently

Honesty is more useful than a victory lap. Three things we'd change with hindsight:

  1. Start the human-checkpoint phase shorter. We ran 4 weeks of human review before cutting over to auto-send. The data showed the agent was ready at week 2.5. We left ~5 days of savings on the table by being conservative.
  2. Build the dashboard sooner. The ops team didn't trust the agent until they could see what it was doing. We added a live dashboard in week 5. It should have shipped in week 2 — adoption would have happened faster.
  3. Document the override policy in writing earlier. "When does the agent escalate vs handle?" was clear in our heads but ambiguous to the ops team for the first month. A written one-pager from week one would have prevented a lot of small confusion.

What it cost

Build: low five figures USD, fixed-price after the discovery. Operate-and-improve retainer in months 1–6: low four figures USD/month, mostly model tuning and adding new properties to the knowledge base. Model API costs: ~USD 280/month at peak email volume — passed through transparently. The full breakdown is consistent with our guide to AI development cost in Nepal.

The pattern behind this case study

The reason this worked isn't unique to hospitality. The pattern is:

If you have a workflow that fits this shape, agents are extremely likely to pay back. Send us the workflow — we'll tell you within a day whether it does.

Want a similar result for your team?

Tell us the recurring task that costs you the most hours each week. We come back within 24 hours with a scope and a fixed-price discovery proposal.

Frequently asked questions about this case study

Quick answers we hear most often.

How big was the team that built this?

Two founders at Astral Mantra Labs led the engagement directly. Bipin ran the technical build and scoping, and KKP ran the post-launch documentation and process work. There were no account managers.

How long did it take from kickoff to production?

Two weeks of fixed-price discovery, six weeks of build, four weeks of human-in-the-loop ramp before full auto-send. End-to-end about 12 weeks from kickoff to fully autonomous.

Can the same approach work for other industries?

Yes. The pattern — one high-volume workflow with deterministic-with-judgement decisions — appears in support, sales-ops, finance-ops, internal IT, and HR. We've shipped variations in education, construction, and healthcare.

What was the build cost?

Low five figures USD, fixed-price after a discovery scope. Plus a low four figures USD/month operate-and-improve retainer for the first six months.

What did the operator cost end up being?

Roughly USD 280/month at peak email volume for model API calls, passed through transparently. The hosting cost added another ~USD 60/month. Total operating cost was a small fraction of the salary cost it replaced.

How was accuracy measured?

We held out 250 labelled emails as a regression suite. Production accuracy was tracked via human-override rate and weekly sampled review. Both metrics are still reported to the client monthly.