Unauthenticated Bulk Extraction of Patient Records From a Research Data Portal

Reported May 13, 2026

Severity Critical

Platform Web

Vulnerability Class Authentication Bypass (CWE-287) / PHI Disclosure

Target Type Rehabilitation Hospital Research Portal

Impact Tens of thousands of patient records with 149 clinical variables exposed unauthenticated

The Risk

Anyone on the internet, with no login and no special tools, could pull individually identifiable health records for tens of thousands of patients out of a national clinical-outcomes research database. Each record contained more than 80 sensitive fields including mental-health history, prior psychiatric hospitalisation, household income, exact age, and insurance type. A short script extracted a full patient profile in minutes. This is a confirmed healthcare data breach far in excess of the regulatory notification threshold and exposing every hospital that contributes to the registry.

The Vulnerability

The portal hosted multiple interactive data-explorer applications built on a reactive web app framework, sitting behind a reverse proxy. Other apps on the same server enforced a session validation step before returning any data: a server-side guard rejected requests without a valid session token.

Two of the apps, two of the data-explorer instances, did not call that guard. When the client initialised the data table widget by sending the framework's normal init message with the data-table hidden flag set false, the server returned an empty errors object and computed the data table without checking any session identifier. The widget then published a per-session data-object URL with a server-generated nonce. That nonce was not an authentication token: the session that produced it had been opened with no credentials, no cookie, and no header.

Three structural facts compounded the bypass:

The transport endpoint reported cookie_needed:false and origins:["*:*"], allowing drive-by extraction from any malicious website.
The reverse proxy added no authentication layer in front of the affected paths.
The data-table widget exposed copy, CSV, and Excel export buttons in the UI, giving any visitor full-database download tooling.

The Attack

Path A: browser proof of public exposure

An incognito tab navigated directly to the two affected data-explorer URLs. Both pages rendered fully with no login prompt and no authentication cookie sent. Curl confirmed identical behaviour with no credentials:

curl -sI https://<portal>/<app-path>/explorer-a/
HTTP/2 200
content-type: text/html

curl -s https://<portal>/<app-path>/explorer-a/<transport-info>
{"websocket":true,"origins":["*:*"],"cookie_needed":false}

Path B: scripted full-profile extraction

A small Python script ran three stages:

Unauthenticated HTTP probe confirming both apps returned HTTP 200 and the transport reported no cookie or origin restriction.
Column enumeration. Scraping the public HTML of each app surfaced the full sensitive variable name list. A single curl plus grep counts them: any variable visible in the dropdowns can be requested.
Single-patient profile extraction. Cycling every column on one app, server-side filtering each one to a single record identifier (the registry's own per-subject unique key), the script merged the responses into a complete patient profile. End-to-end runtime: about 5 minutes.

Only one real patient profile was extracted to prove the depth of the breach. The script did not pull the full database. The extracted record contained internally consistent values across functional-outcome scores, education, occupation, earnings band, clinical severity indicators, and the acute-plus-rehabilitation treatment timeline. The data was real, not synthetic.

The Impact

Each record contained an enumerable per-subject unique identifier. The registry's own data dictionary documents it as "uniquely identifies the rows, one row per subject". The HIPAA Safe Harbor de-identification standard at 45 CFR 164.514(b)(2)(R) prohibits disclosure of any unique identifying number, characteristic, or code. The data also exposed more than 80 quasi-identifiers (exact age, race, sex, occupation, earnings band, insurance type, hospital length of stay) that re-identify rare-profile patients via news and court records.

Confirmed sensitive fields available per patient included mental-health history, psychiatric-hospitalisation history, depression, anxiety, and post-traumatic-stress scores, acute and post-acute insurance type, household income band, exact integer age, race, sex, marital status, occupation, military service, substance use, and clinical severity indicators, plus more than 130 additional clinical variables.

The exposure exceeds the 500-record HIPAA notification threshold by orders of magnitude. The registry is run by an organisation acting as a business associate to multiple contributing centres nationwide; the breach exposes patients from every contributing centre.

Remediation

Enforce the session-validation guard already used by other apps on the same server. When the affected apps are invoked without a valid session, they should return the same silent-validation response their sibling apps already produce.
Add an authentication layer at the reverse proxy for all paths under the research registry: basic auth, mutual TLS, single sign-on, or IP allowlist. The app-level fix and the proxy-level fix should both be in place.
Fix the transport CORS policy. Do not reflect arbitrary Origin values. Given the data sensitivity, removing cross-origin support entirely is the safer default.
Remove the copy, CSV, and Excel export buttons from any data-explorer app, or gate them behind authenticated download tokens.
Strip the per-subject unique identifier from any data-table response intended for aggregate-only consumption. Individual-level access should require authenticated session plus centre-level authorisation.
Initiate the regulatory breach-notification process for the affected records within the statutory window.
Remove framework version headers from responses.