2. Introduction

MGRID XFM streams HL7v3 or compatible messages into a MGRID-enabled healthcare data lake. It is designed to scale and support high message rates.

XFM uses the HL7v3 Reference Information Model (RIM) as its common model, but is able to ingest multiple source formats. It is extendable, providing components for:

  • message validation
  • common model translation,
  • XML to SQL transformation,
  • message aggregation,
  • message constraint checking,
  • data pre-processing,
  • database loading.

It integrates components from the MGRID Messaging SDK for full support of MGRID HDL and HDM, and stores conforming messages in the data lake without data loss.

This manual describes the principles of operation, and how to install, run and extend XFM.

2.1. Overview

XFM performs the three steps from message ingress to uploading to the data lake:

  1. Ingest - perform message preparation such as source message validation and translation to the common (HL7v3 RIM) model, if needed.
  2. Transform - parse source (XML) messages and convert them to SQL using MGRID Messaging.
  3. Load - aggregate messages, insert messages in data ponds, check constraints, preprocess the data before uploading it to the data lake.

These steps can be executed in a highly distributed fashion that allows for high message rates. In the figure below the high-level architecture is depicted.


XFM implements a message flow across processing steps which prepare incoming data before it can be added to a Healthcare Data Lake. The processing steps are as follows:

Source messages are sent through a gateway, which forwards the messages to an internal message broker. The message broker forms the backbone of XFM which makes sure the messages are routed between the components before they enter the data lake.

The ingest step performs message validation (XML schema) and translation to the common model (HL7v3 RIM), if needed. Subsequent processing steps expect messages to adhere to the common model and if needed messages are translated to this model. In the XFM default configuration supported message types are CDA R2 and a small subset of FHIR [1]. However, additional message types can be supported through custom ingesters.

In an optional step custom processors can be added to the flow. Example uses are interaction with a terminology service or Enterprise Master Patient Index (EMPI) for checking or enriching message contents.

The transform step takes the ingested messages and transforms them to SQL or JSON. Because of the large wealth of HL7v3 message types out there, the large number of different HL7v3 normative editions, and the ability for people to quickly define new messages and types, these message processors are built using the MGRID Messaging SDK: a repository of converters support multiple message types and versions, which can be easily expanded for new HL7v3 XML messages or fragments (CMETs).

The load step loads transformed messages into a data pond, performs pre-processing and uploads data to the data lake. Data ponds are small databases that contain a RIM database model. Pre-processing on a data pond consists of code system checking, context conduction and data de-normalization, among other (possibly custom) processing.

Each XFM component (ingester, transformer, loader) can scale horizontally. In other words, additional instances can be created as needed, increasing the message throughput.


[1]Currently supported resources: Organization, Patient, Practitioner.