Insights/Clinician Learning Brief

AI Is Breaking the Assessments CME Still Uses

Topics: AI oversight, Learning design, Workflow-based education
Coverage April 28–May 4, 2025

Abstract

Generative AI lets learners produce competent-looking answers without demonstrating the judgment assessments intend to measure; CME teams must redesign for verifiable critical-appraisal behaviors.

Key Takeaways

  • AI-assisted answers can look competent while hiding whether the learner did the reasoning the assessment was meant to measure.
  • Evidence curation is also under scrutiny as CME professionals plan for possible PubMed disruption without assuming collapse.
  • The shared provider implication is verification: of learner judgment, source quality, and the workflows that connect them.

Health-professions educators are warning that generative AI can let learners produce polished answers without demonstrating the judgment behind them. The public signal is narrow—a single institutional podcast rather than broad clinician corroboration—but it names a practical assessment problem CME teams cannot ignore.

Correct-looking outputs are not proof of competence

In a University of Louisville Health Sciences Center Faculty Feed episode, educators described novice learners treating generative AI output as authoritative and using it to bypass the uncomfortable work of learning. Their concern was not that AI is useless. It was that AI means something different for an expert than for a novice: an expert can judge the output; a novice may simply accept it.

That distinction matters for CME assessment design. A post-test, written reflection, case summary, or care-plan exercise can now measure prompt quality, editing skill, or tool access as much as clinical reasoning. The risk is not academic dishonesty alone. It is invalid inference: the provider may believe an activity captured understanding when it captured a fluent artifact.

This connects to a broader trust thread we saw in an earlier brief on hidden funding and weak evaluation: CME credibility depends on whether the field can show how conclusions were reached, not just whether the final product looks acceptable. In an AI-saturated environment, that same standard applies to learners.

The concrete implication: audit assessments for places where AI can complete the visible task without exposing the learner’s reasoning. If the activity is meant to measure judgment, require learners to critique an AI-generated answer, identify uncertainty, explain why an option is unsafe, or compare the output against evidence and expert reasoning.

Evidence access is becoming an operational dependency

A separate provider-owned Write Medicine episode focused on PubMed as infrastructure for CME and medical writing. The discussion did not claim PubMed is collapsing. It described anxiety about funding, indexing, and topic availability, then moved quickly into practical contingencies: European PubMed Central, Ovid Medline, database help guides, downloaded records, gray literature, association reports, and source-appraisal frameworks.

Because this is a CME-provider podcast rather than independent clinician conversation, it should be read as an operations watch item, not a field-wide consensus. Still, the implication is concrete. CME teams often treat evidence access as a background utility. If that utility becomes less stable—or even just less trusted—content teams need documented ways to confirm currency, completeness, and source quality.

The operational question is whether evidence curation is resilient enough to survive a disruption in a preferred source. A defensible workflow should name backup databases, clarify when gray literature is acceptable, define how currency is checked, and make source-quality review visible in content audits. The point is not to abandon PubMed; it is to avoid building accredited education on a single point of failure.

What CME Providers Should Do Now

  • Run one high-stakes assessment through an AI tool and compare the output with what the activity claims to measure.
  • Revise at least one assessment so learners must explain, critique, or defend reasoning rather than submit only a final answer.
  • Create a short evidence-source playbook that lists backup databases, gray-literature rules, and currency checks for content teams.
  • Add source-diversity and AI-assessment leakage questions to the next annual content or outcomes audit.

What to reconsider

The week’s useful signal is not that AI will ruin assessment or that PubMed is about to fail. It is that CME providers are being forced to verify what used to be assumed: that a learner’s answer reflects their judgment, and that an evidence search reflects a stable, complete enough knowledge base.

Sources

  1. 01
    Podcast

    Rethinking Assessment in the Age of AI: Challenges, Risks, and the Future of Learning with the HSC Office of Professional & Educational Development

    Faculty Feed · · cited segment 1:59-4:10

    Health professions educators describe novices accepting AI output as 'gospel,' bypassing active learning, and note that experts evaluate AI via deep domain judgment.

    Open source
  2. 02
    Podcast

    Beyond PubMed: CME's Hidden Treasure Map

    Write Medicine · · cited segment 13:56-16:20

    CME professionals and medical writers describe proactive archive downloads, European PMC/Ovid alternatives, and CRAAP-style appraisal frameworks as immediate safeguards.

    Open source

Turn learner questions into outcomes data

ChatCME surfaces the questions clinicians actually ask — so you can build activities that close real knowledge gaps.

Request a demo