LIVE · Cloudflare Edge · Llama 3.1 8B

Cut your LLM bill by seventy-five percent.
Keep every drop of intelligence.

Semantic compression middleware for WhatsApp bots running Hindi, Hinglish, Marathi, Bangla. Compatible with GPT-4, Claude, Gemini — any LLM your bot already uses. One line of code. Margins expand from day one.

Hindi
88%
tokens saved
Marathi
64%
tokens saved
Bangla
68%
tokens saved
Edge Latency
~500ms
avg p50
01 · What it does

A real Marathi message. Compressed in one hop.

Raw WhatsApp Message 438 chars · ~150 tokens

मला शिवाजी नगरमध्ये 2BHK फ्लॅट भाड्याने हवा आहे, बजेट 25 हजार आहे. शक्यतो फर्निश्ड असावे आणि पार्किंगची सोय पाहिजे.

Compressed JSON → LLM 53 tokens · 64% saved
{
  "i": "rent_apartment",
  "l": "Shivaji Nagar",
  "bhk": "2BHK",
  "b": "25000",
  "f": "furnished",
  "amt": "parking"
}

Intent preserved. Entities preserved. Language-bloat stripped. Your downstream LLM processes the same meaning with a fraction of the tokens.

02 · Calculate your savings

Your margin expansion, in rupees.

Conservative estimates based on GPT-4o / Claude Sonnet pricing and 70% compression.

5,000
100 25k 50k 75k 100k
Current monthly spend
₹0
on GPT-4o / Claude
With Indic Engine
₹0
including ₹5k retainer
You keep
₹0
net monthly savings
Retainer pays for itself in
— days
Annual net savings
₹0

Assumes 280 input / 90 output tokens per msg · ₹0.21/₹0.85 per 1K tokens · 70% input compression

Lock these savings →
03 · Integration

One line of code. Zero refactoring.

01

Change baseURL

Point your OpenAI/Anthropic SDK at Indic Engine. That's the integration.

02

We compress at edge

Cloudflare Worker extracts intent, compresses to dense JSON via Llama 3.1.

03

Your LLM, cheaper

Forwarded to your chosen LLM with your key. They bill you for compressed tokens.

// Before
const openai = new OpenAI({
baseURL: "https://api.openai.com/v1",
apiKey: process.env.OPENAI_KEY
});

// After — one line changed
const openai = new OpenAI({
baseURL: "https://indic-engine.com/v1", // ← that's it
apiKey: process.env.OPENAI_KEY
});
04 · Real numbers

Benchmarks from our edge.

View full benchmarks →
Language Sample payload Raw chars Compressed tokens Saved
Hindi Real-estate enquiry 518 20 88%
Bangla Office-space search 470 51 68%
Marathi Rental enquiry 438 53 64%
Hinglish Customer complaint Intent extracted < 600ms ~70%
05 · Pricing

One tier open. For now.

We're capping the beta at 10 agencies. Priced low because we're learning your workflows.

Beta · Open
Agency Beta
₹5,000 /month
+ ₹2 per 1,000 tokens saved over 100K
  • Up to 100K saved tokens / mo
  • All Indic languages supported
  • Concierge onboarding (2 hrs)
  • Founder-direct support
  • Vertical-tuned prompts
  • Monthly savings reports
Start with free audit
Agency Pro
₹15,000 /month
+ ₹1.5 per 1,000 tokens saved over 500K
  • Up to 500K saved tokens / mo
  • Priority Slack channel
  • Custom vertical prompts
  • 99.9% uptime SLA
  • Real-time dashboard access
Enterprise
Custom
Volume pricing · Dedicated infra
  • Unlimited tokens
  • Dedicated engineer
  • Private deployment option
  • Custom SLA
  • Routing to Sarvam/Bhashini
Contact founder
Honest boundaries

What Indic Engine isn't.

We'd rather lose a bad-fit deal than over-promise. If you're building something on this list, we'll tell you upfront.

  • Not a replacement for your LLM
    You still use GPT-4, Claude, or Gemini. We sit in front.
  • Not a translation service
    Output is structured JSON for your bot. Not translated prose.
  • Not a chatbot builder
    You already have the bot. We optimize what it already does.
  • Not ideal for long multi-turn reasoning
    Best for transactional messages — orders, queries, leads. Use raw LLM for complex reasoning flows.

Send us 100 anonymized messages.
We'll show you exactly what you're overpaying.

No meeting. No sales call. Just math on your real traffic. 24-hour turnaround.

Request your free audit
Integration Guide

Indic Engine Docs

One endpoint. One header. One line of code. Works with any OpenAI-compatible SDK (OpenAI, Anthropic Claude via proxy, Gemini-compatible clients).

1

Authentication

After onboarding, you'll receive a unique API key. Pass it in the Authorization header.

Authorization: Bearer ie_live_YOUR_API_KEY_HERE Content-Type: application/json
2

Endpoint

Standard POST to our edge. Use the vertical query parameter to activate tuned extraction prompts.

POST https://indic-engine.com/v1/chat/completions?vertical=realestate // Supported verticals: // realestate · ecommerce · leadgen · default
3

Request body

{ "input": "Bhaiya mujhe Hinjewadi phase 2 mein 3BHK chahiye, budget 1.5cr." }
4

Response

Dense JSON. Forward data directly to your LLM's system prompt.

{ "savings": "75%", "tokens": { "in": 120, "out": 30 }, "data": "{\"i\":\"search\",\"l\":\"Hinjewadi phase 2\",\"bhk\":\"3BHK\",\"b\":\"1.5cr\"}" }
5

Full example (curl)

curl -X POST "https://indic-engine.com/v1/chat/completions?vertical=realestate" \ -H "Authorization: Bearer ie_live_xxx" \ -H "Content-Type: application/json" \ -d '{"input": "मला शिवाजी नगरमध्ये 2BHK हवे आहे"}'
Built-in fail-safe

If the compression layer fails JSON validation for any reason, Indic Engine automatically returns the raw input untouched. Your bot never breaks. 100% uptime guaranteed — worst case, you lose the savings for that one message.

LLM compatibility

Indic Engine is model-agnostic. The compressed output works with:

  • OpenAI GPT-4, GPT-4o, GPT-3.5
  • Anthropic Claude (Sonnet, Opus, Haiku)
  • Google Gemini (Pro, Flash)
  • → Any OpenAI-SDK-compatible model
Edge Benchmarks

Benchmarks

Real measurements from our Cloudflare edge running Llama 3.1 8B. Every number below is reproducible — we'll run your own CSV against this same stack.

Avg edge latency
~500ms
Best compression
88%
Uptime guarantee
100%
Fallback on fail
0ms added

Per-language results

Sample real-world WhatsApp messages run through the production pipeline.

Hindi Devanagari script
88% saved
Raw
518 characters · real-estate enquiry
Compressed
20 tokens to LLM
Bangla Bengali script
68% saved
Raw
470 characters · office-space search
Compressed
51 tokens to LLM
Marathi Devanagari script
64% saved
Raw
438 characters · rental enquiry
Compressed
53 tokens to LLM
Hinglish Roman + Indic code-mixing
~70% saved
Customer complaint / rant intent extraction completed in <600ms with flawless entity preservation.

Failure behavior

If JSON validation fails (malformed output, empty response, timeout), Indic Engine instantly bypasses compression and returns your raw input. Your bot sees exactly what the end user sent. No retries. No hangs. No silent failures.

Trade-off: you lose the savings on that one message. Benefit: your client's bot never goes down because of us.

Free savings audit

Show us 100 messages. We'll show you the bill you could've had.

No payment. No commitment. We run your anonymized sample through our engine and email you the exact rupee savings — usually within 24 hours.

Opens your email client to send to founder@indic-engine.com. No data stored on this page.

What happens after you submit

  1. 01
    Founder replies within 2 hours asking for a CSV of 50–100 anonymized messages (or sample screenshots if easier).
  2. 02
    We run them through the engine and benchmark against your current LLM pricing.
  3. 03
    You receive a detailed savings report in a Google Sheet. Monthly rupees saved. Payback period. Real compressed samples.
  4. 04
    If the numbers make sense for you, we onboard in 15 minutes. If not, you keep the report.

Welcome to Indic Engine.

Your retainer is active. Your margins just got wider.

What happens next
  1. Within 2 hours, our founder will email you from founder@indic-engine.com.
  2. You'll receive your unique API key and vertical-specific integration snippets.
  3. 15-minute Loom walkthrough of the one-line code change.
  4. First weekly savings report arrives next Monday.
Read docs while you wait →