LLMs Meet Rules Engines: The Hybrid Architecture for Regulated Industries

In regulated industries like healthcare, the tension between artificial intelligence and deterministic business rules is not theoretical. It shows up every day in the systems that process formulary decisions, adjudicate claims, and evaluate prior authorization requests. LLMs are remarkably good at understanding context, parsing unstructured documents, and generating human-quality explanations. They are also prone to hallucination, inconsistent across identical inputs, and fundamentally unauditable in their reasoning. Rules engines are the opposite: perfectly consistent, fully auditable, and completely unable to handle ambiguity or novel situations.

Neither approach works alone in regulated healthcare. The hybrid architecture that combines both is emerging as the standard.

Why Pure AI Fails in Regulated Settings

Consider a prior authorization request for a specialty medication. An LLM could read the clinical notes, understand the diagnosis, check the drug's indication, and generate a coverage recommendation. It might even be right 95% of the time. But that 5% failure rate is catastrophic in a regulatory context where every decision must be explainable, consistent, and auditable.

Consistency requirement. Two identical PA requests submitted by different providers must receive the same decision. LLMs do not guarantee this. The same prompt can produce different outputs on different runs.
Audit trail requirement. When CMS audits a Part D plan's coverage decisions, the plan must show the specific criteria that were applied to each decision. "The AI thought this was appropriate" is not an acceptable audit response.
Regulatory change management. When CMS updates coverage criteria, the system must be updated to reflect the new rules deterministically. You cannot retrain an LLM and hope it picks up the nuance of a regulatory change.

Why Pure Rules Engines Fail

Rules engines solve the consistency and auditability problems, but they create their own set of failures:

Brittleness. A rules engine that evaluates prior authorization criteria can only process structured data. If the clinical justification arrives as a free-text letter from a physician, the rules engine cannot read it.
Maintenance burden. A large formulary operation might have thousands of rules covering PA criteria, step therapy, quantity limits, and clinical policies. Maintaining these rules as policies change is a full-time job for multiple analysts.
No contextual understanding. A rules engine can check whether a patient has tried Drug A before Drug B (step therapy). It cannot understand that a physician's note explaining why the patient cannot tolerate Drug A constitutes a valid clinical exception.

The Hybrid Architecture

The hybrid approach assigns each component the task it is best suited for:

LLMs handle:

Parsing unstructured documents (clinical notes, physician letters, contract language)
Extracting structured data from free text
Generating human-readable explanations of decisions
Answering natural language queries about formulary policies
Identifying anomalies and edge cases that need human review

Rules engines handle:

Evaluating coverage criteria against structured data
Enforcing formulary tier logic deterministically
Generating audit trails with specific rule citations
Processing claims adjudication at transaction speed
Ensuring consistency across identical inputs

The Data Flow

In practice, the hybrid architecture works as a pipeline. The LLM sits at the input layer, converting unstructured information into structured data. The rules engine sits at the decision layer, applying deterministic logic to the structured data. The LLM then sits again at the output layer, generating human-readable explanations of the rules engine's decisions.

# Hybrid pipeline: LLM (parse) -> Rules Engine (decide) -> LLM (explain)

def process_pa_request(clinical_notes, patient_data, drug_ndc):
    # Step 1: LLM extracts structured data from clinical notes
    extracted = llm.extract({
        "prompt": f"Extract the following from these clinical notes: "
                  f"diagnosis_codes, prior_medications_tried, "
                  f"contraindications, physician_rationale.\n\n"
                  f"{clinical_notes}",
        "output_schema": PAExtractionSchema
    })

    # Step 2: Rules engine evaluates PA criteria deterministically
    decision = rules_engine.evaluate(
        drug_ndc=drug_ndc,
        patient=patient_data,
        extracted_clinical=extracted,
        ruleset="pa_criteria_v2025_q3"
    )
    # decision contains: approved/denied, rules_applied[], evidence[]

    # Step 3: LLM generates human-readable determination letter
    letter = llm.generate({
        "prompt": f"Generate a prior authorization determination "
                  f"letter for {decision.outcome}. "
                  f"Rules applied: {decision.rules_applied}. "
                  f"Clinical evidence: {decision.evidence}.",
        "tone": "professional_clinical",
        "constraints": ["cite specific criteria", "no medical advice"]
    })

    return {
        "decision": decision,
        "audit_trail": decision.full_audit_log,
        "letter": letter,
        "extraction_confidence": extracted.confidence_scores
    }

Confidence Scoring and Human Escalation

A critical component of the hybrid architecture is knowing when to escalate to human review. The LLM's extraction step should include confidence scores. When confidence is low (ambiguous clinical notes, unclear diagnosis, conflicting information), the system routes the case to a human reviewer rather than letting the rules engine make a decision on uncertain inputs.

The goal is not to remove humans from the process. The goal is to route the right cases to humans. A well-designed hybrid system handles 70-80% of routine cases automatically and routes the complex 20-30% to clinical reviewers with all the context pre-assembled.

Auditability by Design

The hybrid architecture produces better audit trails than either pure approach. For every decision, the system records: what the LLM extracted from the source documents, what confidence level the extraction had, which specific rules were evaluated, which rules triggered the decision, and what the final determination was. Regulators can trace every decision back to specific inputs, specific rules, and specific evidence.

This is the architecture pattern that regulated healthcare technology is converging on. It respects the strengths and limitations of both AI and deterministic systems. For organizations building formulary management, utilization management, or clinical decision support systems, the hybrid approach is no longer experimental. It is the pragmatic standard for production systems that must be both intelligent and trustworthy.