learnings-from-using-a-7b-local-model-to-emit-tmf-compliant-json

Introduction

With talks about AI transforming telecom some demonstrations use a cloud API, a made-up schema, and a demo that works exactly once. I wanted something different: a 7-billion parameter model running entirely on my laptop, converting a messy natural-language broadband order into a JSON object that a real OSS backend could actually consume — aligned to the TM Forum TMF641 Service Order API.

What followed was Milestone 1 of my Telecom-AI Agent Framework. Here is what I learned forcing a small local model to stop generating prose and start emitting precise, schema-valid data.

The Problem/Context

When a field engineer or customer service rep raises a broadband order, the downstream provisioning systems — OSS, BSS, network controllers — require exact, structured data. There is no room for “Uh, it’s a 1Gbps fiber thing for John at 123 Main Street.”

Those systems need something like this:

{
  "order_id": "SO-2026-001",
  "order_state": "INITIAL",
  "customer": { "name": "John Doe" },
  "service_address": { "street": "123 Main St" },
  "product": { "service_type": "FTTH" },
  "requested_bandwidth": {
    "downstream_speed": 1,
    "upstream_speed": 1,
    "unit": "Gbps"
  }
}

The challenge of M1 was answering one foundational question: can a 7B local model reliably bridge that gap? And if so, what does the architecture need to look like to make it production-worthy — without cloud APIs, without GPT-4, and without crossing your fingers?

The Delta (my learnings)

Before this milestone, I understood intent-based fulfilment as a concept. I knew LLMs could extract information from natural language. What I did not understand was how to make a small local model do it reliably and measurably.

1. JSON-mode is not enough — you need a re-ask loop

Ollama’s format: json parameter tells the model to output JSON. What it doesn’t guarantee is that the JSON matches your schema. The solution is an Instructor-style coercion loop: if Pydantic validation fails, the model gets another attempt with the specific validation error injected back into the prompt.

for attempt in range(max_retries):
    response = await ollama_client.generate(prompt)
    try:
        return ServiceOrder.model_validate_json(response)
    except ValidationError as e:
        prompt = inject_validation_error(prompt, e)
# raise after max_retries exhausted

This turns a ~70% baseline schema-validity rate into 90%+ without changing the model or the prompt.

2. Two good few-shot examples beat a long system prompt

The order_intake.py prompt uses exactly two few-shot examples — not five, not ten. Two, covering the most common ambiguities: a residential FTTH order and a business upgrade. A long system prompt telling the model how to think performed worse than a short one with concrete input → output pairs.

3. Build your eval harness before you tune anything

The evals/ directory contains a 10-example JSONL dataset with expected ServiceOrder outputs. The runner scores two metrics per model:

  • Schema validity — did it produce parseable JSON matching the Pydantic model?
  • Field accuracy — did it correctly capture bandwidth, address, service type?

Running this across qwen2.5:7b, before touching the prompt gave me a baseline I could actually optimise against. Without the harness, I would have been tuning by gut feel.

Here are the actual results from the M1 eval run (make eval, 10 cases, qwen2.5:7b):

MetricResult
Schema Validity100% (10/10 cases)
Field Accuracy100% (9/9 fields per case)
Latency p5080.3 s
Latency p95121.4 s

The accuracy exceeded the M1 acceptance criteria (≥90% schema validity, ≥75% field accuracy). The latency is the honest story: 80 seconds p50 on a laptop running a 7B model locally. This is not a production serving latency — it’s a proof-of-concept baseline. Optimising inference speed is out of scope for M1 and is a known trade-off of fully local execution with no GPU acceleration. For the purpose of this milestone, proving the schema contract holds under real inputs matters far more than response time.

Why This Matters for Telecom AI

The repository is structured around a principle I now consider non-negotiable: isolate your schema contracts from everything else before writing a single prompt.

src/telecom_ai/
├── schemas/               # Pydantic models — the TMF641-aligned source of truth
│   ├── service_order.py
│   ├── customer.py
│   ├── product.py
│   └── common.py
├── llm/                   # LLM communication layer
│   ├── ollama_client.py   # Async Ollama wrapper with retry logic
│   └── structured.py      # Reask loop on ValidationError
├── prompts/
│   └── order_intake.py    # System prompt + 2 few-shot examples
└── cli.py                 # `nl-order` CLI entry point

M1’s only job is to prove the schema contract holds. No RAG, no tool calling, no multi-agent workflows — just: can the model reliably emit TMF-compliant JSON?

This is load-bearing for every later milestone. When M2 adds vector retrieval and something breaks, I will know it is the retrieval layer — not the model’s ability to follow the schema. When M3 introduces tool calling, I start from a stable floor.

The instinct to skip this step and go straight to “agentic RAG” is strong. Resist it.

How To

To try it yourself — no API keys, no cloud:

git clone https://github.com/spereir2/telecom-ai-agent-framework
cd telecom-ai-agent-framework
make up       # starts Ollama via Docker Compose, pulls qwen2.5:7b
make smoke    # runs the canonical nl-order example
make eval     # scores all 3 models on the 10-example eval set

To adapt this pattern for your own domain:

  1. Define your Pydantic schema first — model it on an industry standard (TMF, 3GPP, TM Forum APIs) so it’s immediately meaningful to OSS/BSS integrations
  2. Write two few-shot examples — cover your two most common input variations; don’t over-engineer the prompt
  3. Add the reask loop — catch ValidationError, inject the error message, retry up to 2 times
  4. Build a small eval set — 10 hand-written examples is enough to establish a baseline; grow it as you discover edge cases in production

In closing

Before you build a telecom AI agent, prove your local model can emit schema-valid, TMF-aligned JSON — with an eval harness to measure it. Everything downstream depends on this floor being solid.