Integrating Metaspan Data Feeds: Architecture Patterns

Drug pricing and classification data is the operational backbone of every formulary management system. For many PBMs and health plans, that data comes from Medi-Span (also referred to as Metaspan in some organizations), Wolters Kluwer's comprehensive drug data platform. Getting the weekly data ingestion pipeline right from the start is critical because every downstream process depends on it: formulary updates, claims adjudication, rebate calculations, and regulatory reporting.

This article covers the architecture patterns we have found most effective for building reliable, maintainable Metaspan data integration pipelines.

Understanding the Data

Medi-Span data feeds typically include several core file types delivered weekly:

Master Drug Data Base (MDDB) - The comprehensive drug file containing NDCs, GPI codes (Generic Product Identifier), drug names, strengths, forms, and packaging information.
Price Files - AWP (Average Wholesale Price), WAC (Wholesale Acquisition Cost), and other pricing benchmarks. These change weekly and are critical for claims pricing.
Drug-Drug Interaction - Pre-computed interaction pairs with severity levels and clinical monographs.
Drug-Disease Contraindication - Mappings between drugs and ICD-10 codes where the drug is contraindicated.
Duplicate Therapy - Identification of drugs in the same therapeutic class for duplication alerts.

The files are typically flat-file format (fixed-width or pipe-delimited) and can be large. The full MDDB file contains hundreds of thousands of records. Weekly updates include both full replacement files and delta (change-only) files.

Pattern 1: Immutable Staging with Version Control

The most important architectural decision is to never overwrite production data directly with incoming feed data. Instead, load each weekly feed into an immutable staging area with a version identifier:

-- Schema: versioned staging tables
CREATE TABLE staging.drug_master (
    feed_version    DATE NOT NULL,      -- e.g., 2025-05-05
    ndc             VARCHAR(11) NOT NULL,
    gpi             VARCHAR(14),
    drug_name       VARCHAR(200),
    strength        VARCHAR(50),
    dosage_form     VARCHAR(50),
    route           VARCHAR(50),
    awp_unit_price  DECIMAL(12,4),
    wac_unit_price  DECIMAL(12,4),
    obsolete_date   DATE,
    loaded_at       TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (feed_version, ndc)
);

-- Index for efficient version comparison
CREATE INDEX idx_drug_master_ndc ON staging.drug_master(ndc);

This pattern gives you several critical capabilities: rollback to any previous week's data, side-by-side comparison between weeks, and a complete audit trail of every data change.

Pattern 2: Change Detection Pipeline

Once the staging data is loaded, a change detection step identifies what actually changed from the previous week. This is where most naive implementations go wrong, because they try to do a full table swap instead of understanding the delta:

def detect_changes(current_version, previous_version, db):
    """Identify new, modified, and discontinued NDCs."""
    query = """
    SELECT
        COALESCE(c.ndc, p.ndc) AS ndc,
        CASE
            WHEN p.ndc IS NULL THEN 'NEW'
            WHEN c.ndc IS NULL THEN 'DISCONTINUED'
            WHEN c.awp_unit_price != p.awp_unit_price
                OR c.gpi != p.gpi
                OR c.drug_name != p.drug_name
            THEN 'MODIFIED'
            ELSE 'UNCHANGED'
        END AS change_type,
        c.awp_unit_price AS new_price,
        p.awp_unit_price AS old_price,
        c.gpi AS new_gpi,
        p.gpi AS old_gpi
    FROM staging.drug_master c
    FULL OUTER JOIN staging.drug_master p
        ON c.ndc = p.ndc
        AND p.feed_version = %s
    WHERE c.feed_version = %s
        AND (p.ndc IS NULL
            OR c.ndc IS NULL
            OR c.awp_unit_price != p.awp_unit_price
            OR c.gpi != p.gpi
            OR c.drug_name != p.drug_name)
    """
    return db.execute(query, [previous_version, current_version])

Pattern 3: Impact Assessment Layer

Raw data changes are not actionable until they are assessed against the current formulary. A price change on an NDC that is not on your formulary is noise. A GPI reclassification on a Tier 1 drug with $50M in annual claims is a critical event. The impact assessment layer connects data changes to business context:

def assess_impact(changes, formulary, claims_data):
    """Score each change by business impact."""
    assessments = []
    for change in changes:
        ndc = change["ndc"]
        formulary_entry = formulary.get(ndc)

        if not formulary_entry:
            continue  # Not on our formulary

        annual_spend = claims_data.get_annual_spend(ndc)
        member_count = claims_data.get_unique_members(ndc)

        impact_score = 0
        alerts = []

        if change["change_type"] == "DISCONTINUED":
            impact_score = 100  # Always critical
            alerts.append("NDC discontinued - replacement needed")

        elif change["change_type"] == "MODIFIED":
            if change.get("new_price") and change.get("old_price"):
                price_delta = (
                    (change["new_price"] - change["old_price"])
                    / change["old_price"]
                )
                if abs(price_delta) > 0.10:
                    impact_score += 50
                    alerts.append(
                        f"Price changed {price_delta:.1%}"
                    )

            if change.get("new_gpi") != change.get("old_gpi"):
                impact_score += 40
                alerts.append("GPI reclassification")

        assessments.append({
            "ndc": ndc,
            "tier": formulary_entry["tier"],
            "annual_spend": annual_spend,
            "member_count": member_count,
            "impact_score": impact_score,
            "alerts": alerts,
        })

    return sorted(assessments,
                  key=lambda x: x["impact_score"],
                  reverse=True)

Pattern 4: Automated Validation Gates

Before any data moves from staging to production, automated validation gates must pass. These catch data quality issues that would otherwise propagate through the entire system:

Record count validation: The new feed should be within 2% of the previous feed's record count. A 50% drop indicates a truncated file.
Price reasonableness: Flag any NDC with a price change greater than 200% for manual review.
Referential integrity: All GPI codes in the drug file should exist in the GPI reference table.
Completeness checks: Required fields (NDC, drug name, GPI) should never be null.

If any gate fails, the pipeline halts and alerts the operations team. The previous week's data remains in production until the issue is resolved. This is why the immutable staging pattern is essential: you always have a clean fallback.

Scheduling and Monitoring

Medi-Span data typically arrives on a predictable weekly schedule (often Monday or Tuesday). The ingestion pipeline should run automatically on a schedule with enough buffer time to handle delivery delays. A typical schedule looks like:

Monday 6 AM: Check for new data files in the SFTP drop location
Monday 6:15 AM: Load into staging tables with version tag
Monday 6:30 AM: Run validation gates
Monday 7:00 AM: Run change detection and impact assessment
Monday 7:30 AM: Send impact report to formulary team
Monday 9:00 AM: Promote to production (automatically if all gates pass, or after manual approval for high-impact changes)

Building this pipeline correctly on the first pass avoids months of technical debt. The patterns here are not complex, but they require discipline: immutable staging, change detection, impact assessment, and automated validation. Skip any one of these and the pipeline becomes a source of operational risk rather than operational confidence.