3. User Guide

This guide describes general topics about the HL7v3 and SQL integration. We assume general knowledge about HL7v3 and SQL. No particular Unix or programming experience is required.

3.1. General Information

The Healthcare Datatypes is a specification of datatypes drafted for the purpose of information exchange, with XML as the target information medium. To be able to use data expressed in HL7v3 format as close as possible in a relational database system, MGRID has created the HDL that consists of a set of user defined types (UDT) and user defined functions (UDF).

The following aspects of the type implementation require special attention:

  1. The object oriented type hierarchy, see Type hierarchy.
  2. The NullFlavors, see NullFlavor Handling.
  3. The different type categories; simple and complex. See Complex and Simple types.

3.1.1. Complex and Simple types

  1. All Healthcare Datatypes have a complex type implementation. Complex types have a JSON literal form. Complex types can be recognized by their uppercase typename, such as CD, PQ and ANY. Complex types exist to persist input transformed from XML HL7v3 messages.

    To date two different versions of the Healthcare Datatypes have been defined by HL7; datatypes R1 and R2. Starting with Normative Edition 2012, the RIM has switched from the R1 types to the R2 types. The HDL implementations are bound to schema r1 and r2. Users can select the release they want by adding the appropriate schema to their search_path.

    See Complex Types for more information and a complete list.

    SELECT cvinval(code := 'DOCBODY', "codeSystem" := '2.16.840.1.113883.5.6');
                                       cvinval
    ------------------------------------------------------------------------------
     {"code": "DOCBODY", "dataType": "CV", "codeSystem": "2.16.840.1.113883.5.6"}
    (1 row)
    
  2. Some Healthcare Datatypes have a simple type implementation. Simple types, that have a lowercase typename, are scalar types that resemble the built-in types of the database server, such as integer, text, and date. See Simple Types for more information and a complete list. These types are bound to the schema hl7.

    The simple types exist to allow in-database computations and additional indexing on Healthcare Datatypes, such as physical quantity conversions and partial ordering, concept descriptor subtree search, and sets of interval of time containment and overlap queries.

    For types that have a simple type implementation, CASTs are implemented that allow conversion from complex to simple form, to enable computations.

    SELECT cvinval(code := 'DOCBODY', "codeSystem" := '2.16.840.1.113883.5.6')::cv;
                cvinval
    -------------------------------
     DOCBODY:2.16.840.1.113883.5.6
    (1 row)
    

    ... to enable computations.

    SELECT cvinval(code := 'DOCBODY', "codeSystem" := '2.16.840.1.113883.5.6')::cv
              << 'CONTAINER'::cv('ActClass');
     ?column?
    ----------
     t
    (1 row)
    

    There is no R1 or R2 distinction for simple types: it is possible to cast both the R1 and the R2 version of a complex type to the simple type.

3.1.2. Schemas (namespaces)

The Healthcare Datatypes are created in the schemas

  1. hl7 for the simple types
  2. r1 for the complex R1 types
  3. r2 for the complex R2 types
  4. hdl for functions and support types

It is necessary to add these schemas to the database its default search_path, by using the ALTER DATABASE command. See ERROR: type “<name>” does not exist.

A few of this HDL named types conflict with the reserved words of PostgreSQL. These are any, and, or, not and precision. For these names it is necessary to always prefix the typename with hl7.:

show search_path;
             search_path
-------------------------------------
 "",public,hl7,hdl,r1

Since any is a reserved word, the type any cannot be named directly:

create table ta (a any);
ERROR:  syntax error at or near "any"
LINE 1: create table ta (a any);

Since any also conflicts with PostgreSQLs own any, adding quotes does not work either:

create table ta (a "any");
ERROR:  column "a" has pseudo-type "any"

The correct way to name HL7s any is to prefix it with the schema name:

create table ta (a hl7.any);
CREATE TABLE

3.1.3. Type hierarchy

The Healthcare Datatypes are specified using a type hierarchy. The simple types, R1 complex types and R2 complex types each have their own hierarchy. The root of the simple type hierarchy is the any type. All simple types can be cast to any. The root type of the complex types are ANY and HXIT, for the R1 and R2 types respectively.

If a type is a subtype of another type, conversion from one type to the other is implemented with a type cast. With the type cast, it is possible to insert e.g. a PQ value in a database column with type ANY. With the type cast, it is also possible to convert the ANY datavalue back to its original type PQ. Examples where ANY values are cast back to their original type can be found in Appending operators to select paths and Does complex data value contain a value?.

It is also possible to switch from the complex type hierarchy to the simple type hierarchy with casts. For instance, it is possible to cast the complex type PQ to the simple type pq.

3.1.4. NullFlavor Handling

NullFlavors, such as NullFlavor.UNK, extend the normal domain of each healthcare datatype. Almost every datavalue can be a NullFlavor instead of a normal value. In every aspect of working with the Healthcare Datatypes, NullFlavor handling plays a role. NullFlavors are defined in NullFlavors and Interaction between database NULL and NullFlavors describes the interaction between the database servers NULL and the NullFlavors.

3.1.5. BL returning functions vs operators

For each BL returning property of the Healthcare Datatypes such as equal and lessthan, two implementation forms exist:

  1. A BL-returning user defined function (UDF) with the name of the property, such as equal. This function may be used in SELECT lists. It is not endorsed to use these functions in WHERE clauses, where operators are preferred instead.
  2. An operator, such as =. In WHERE clauses the operator form of properties is preferred, since operators enable the use of indexes. Though operators may be used in SELECT lists, it is not recommended to do so, since the operators do not return NullFlavors. In SELECT lists, the function form of the property is recommended.

3.1.5.1. Proper use of functions and operators in select lists and where clauses

SELECT equal('1 l'::pq, '1 dm3')  -- use functions in select lists
WHERE '2 l'::pq > '0.1 ml';      -- use operators in where clause

 equal
-------
 true

3.1.6. Using indexes

The default behaviour of the CREATE INDEX command is to create an index with the default btree operator class for that datatype. While the btree indexes are useful for primary key and total ordering purposes, the most useful operators on Healthcare Datatypes can be indexed by gist indexes. This section shows how to create gist indexes. Section Index support lists all the index types and operators. An example subset of that table is repeated below.

 indexed_type  | index_method |     opclass_name      | is_default |        operators
---------------+--------------+-----------------------+------------+--------------------------
 cv            | gist         | gist_cv_ops           | t          | = >> <<
 ii            | gist         | gist_ii_ops           | t          | =
 ii[]          | gin          | gin__ii_ops           | t          | && <@ @> =
 ivl_pq        | gist         | gist_ivl_pqops        | t          | && = ~ @
 ivl_ts        | gist         | gist_ivl_tsops        | t          | && = ~ @
 ivl_ts        | gist         | gist_ivl_tsunionops   | f          | && = ~ @
 pq            | btree        | pq_ops_identical      | f          | ==
 pq            | gist         | gist_pq_ops           | t          | < <= = >= >
 pq[]          | gin          | gin__pq_ops           | t          | && <@ @> =

For operator classes that are not default for the index type and index method, it is necessary to specify the operator class name when creating the index, as is shown in the example below.

3.1.6.1. Use an index range scan on pq

This example shows how to create an index for a table with a pq column, and how to check if a query uses the index for a range scan with <.

lake=# CREATE TABLE mytable (mycol pq);
CREATE TABLE
lake=# INSERT INTO mytable SELECT a::text || ' [yd_us]' FROM generate_series(1,10000) AS a(a);
INSERT 0 10000
lake=# INSERT INTO mytable SELECT a::text || ' [ft_us]' FROM generate_series(1,10000) AS a(a);
INSERT 0 10000
lake=# CREATE INDEX myindex ON mytable USING GIST(mycol gist_pq_ops);
CREATE INDEX
lake=# ANALYZE mytable;
ANALYZE
lake=# EXPLAIN SELECT * FROM mytable WHERE mycol < '10 [ft_us]';
                              QUERY PLAN
-----------------------------------------------------------------------
 Bitmap Heap Scan on mytable  (cost=4.31..68.69 rows=20 width=86)
   Recheck Cond: (mycol < '10 [ft_us]'::pq)
   ->  Bitmap Index Scan on myindex  (cost=0.00..4.30 rows=20 width=0)
         Index Cond: (mycol < '10 [ft_us]'::pq)
(4 rows)

lake=# SELECT * FROM mytable WHERE mycol < '4 [ft_us]' ORDER BY mycol;
   mycol
-----------
 1 [ft_us]
 2 [ft_us]
 3 [ft_us]
 1 [yd_us]
(4 rows)

3.1.6.2. Create a multicolumn gist index on ii and cv

This example shows how to create a multicolumn index. Because the operator class names are not specified, the default operator classes gist_ii_ops and gist_cv_ops are used.

lake=# CREATE TABLE mytable (id ii, code cv);
CREATE TABLE
lake=# CREATE INDEX myindex ON mytable USING gist(id, code);
CREATE INDEX
lake=# INSERT INTO mytable VALUES ('38e0055e-3e3d-11de-ba73-f72d0933defc','274589008'::cv('SNOMED-CT'));
INSERT 0 1
lake=# SET enable_seqscan = false;
SET
lake=# EXPLAIN ANALYZE SELECT * FROM mytable
  WHERE id = '38e0055e-3e3d-11de-ba73-f72d0933defc'
  AND code << '74627003'::cv('SNOMED-CT');
                                                                                  QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using myindex on mytable  (cost=2.64..10.66 rows=1 width=64) (actual time=0.010..0.010 rows=0 loops=1)
   Index Cond: ((id = '38e0055e-3e3d-11de-ba73-f72d0933defc'::ii) AND (code << '74627003:2.16.840.1.113883.6.96@20140131:2.16.840.1.113883.6.96.1@20140131'::cv('SNOMED-CT')))
 Planning time: 0.172 ms
 Execution time: 0.038 ms
(4 rows)

3.2. Working with code systems

This section describe how to perform common actions when working with codesystems in the database.

3.2.1. Installing a codesystem

Before a codesystem can be used to instantiate CV coded values, the codesystem must be loaded into the tables created by the HL7 base table module.

Installing the codesystem files on the host operating system is described in Installing the software. Once the files are installed, a codesystem can be loaded in a database by loading the corresponding extension.

3.2.1.1. Installing a codesystem in a database

CREATE EXTENSION snomedctvocab_20110731;
CREATE EXTENSION

Once the codesystem is loaded, it can be used by instantiating CV data values that use the codesystem, in the same database.

Codesystems are scoped within a single database. If the codesystem needs to be used in another database, the codesystem must be loaded in the other database as well. If different versions of the same codesystem must be used in parallel, this can be achieved by creating a separate database for each codesystem version that must be loaded.

3.2.2. Exploring a codesystem

Use the function codesystem to query contents of HL7v3 codesystems.

3.2.2.1. Display a tree of the full codesystem

With the codesystem function a complete codesystem is shown as text. This is useful relatively small codesystems, like most of HL7v3s codesystems. The codesystem name is matched case sensitive. When no codesystem is found, no results and no error is shown.

select codesystem('EntityClass');
              codesystem
--------------------------------------
 entity (ENT)
  health chart entity (HCE)
  living subject (LIV)
   non-person living subject (NLIV)
    animal (ANM)
    microorganism (MIC)
    plant (PLNT)
   person (PSN)
  material (MAT)
   chemical substance (CHEM)
   food (FOOD)
   manufactured material (MMAT)
    container (CONT)
     holder (HOLD)
    device (DEV)
     certificate representation (CER)
     imaging modality (MODDV)
  organization (ORG)
   public institution (PUB)
   state (STATE)
    Nation (NAT)
  place (PLC)
   city or town (CITY)
   country (COUNTRY)
   county or parish (COUNTY)
   state or province (PROVINCE)
  group (RGRP)
(27 rows)

3.2.2.2. Search a codesystem for a text

It is also possible to query a codesystem for codes that have a certain text as part of their description:

SELECT codesystem('ActClass','obs');
                codesystem
-------------------------------------------
 genomic observation (GEN)
 observation (OBS)
 correlated observation sequences (OBSCOR)
 observation series (OBSSER)
 specimen observation (SPCOBS)
(5 rows)

Instead of using a substring search on the designation, it is also possible to query for subtrees of hierarchical codesystems ‘above’ or ‘under’ a given coded value. This is shown in examples Show the ancestor hierarchy of a coded value and Show the descendants of a coded value.

3.2.3. Working with coded values

MGRID HDL contains CV implemented as a base type, that together with its functions and operators provide robust in-database vocabulary support. Functions on ‘cv’ lists the functions and operators on CV.

3.2.3.1. Creating coded values

Constructor methods for ‘cv’ describes how coded values can be instantiated with the standard literal form, as well as using a type modifier to specify the conceptdomain. A CV constrained to a particular conceptdomain, such as CV('ActClass'), can be used to create tables:

3.2.3.1.1. Create a table column for codes from a specific conceptdomain

This example requires that a HL7v3 vocabulary extension is loaded.

lake=# CREATE TABLE examplecv (a cv('ActClass'));
CREATE TABLE
lake=# INSERT INTO examplecv VALUES ('GEN'),('OBS'),('PCPR'),('SBADM');
INSERT 0 4
lake=# SELECT * from examplecv;
                                       a
--------------------------------------------------------------------------------
 GEN:2.16.840.1.113883.5.6@2010-11-10:2.16.840.1.113883.1.11.11527@2010-11-10
 OBS:2.16.840.1.113883.5.6@2010-11-10:2.16.840.1.113883.1.11.11527@2010-11-10
 PCPR:2.16.840.1.113883.5.6@2010-11-10:2.16.840.1.113883.1.11.11527@2010-11-10
 SBADM:2.16.840.1.113883.5.6@2010-11-10:2.16.840.1.113883.1.11.11527@2010-11-10
(4 rows)

3.2.3.2. Lookup information on a coded value

The example below shows how to lookup the displayname of a code. Note that this example requires that a SNOMED-CT vocabulary extension is loaded.

3.2.3.2.1. Lookup information on a coded value
select displayname('409586006'::cv('SNOMED-CT'));
 displayname
-------------
 Complaint
3.2.3.2.2. Lookup information on coded values in a table

The example below shows how to get code and codesystem information from a CV stored in the table. Note: this table is created by example Create a table column for codes from a specific conceptdomain. See Functions on ‘cv’ for a complete list of functions that can be used.

SELECT displayname(a), codesystemname(a),
code(a), codesystem(a), implies(a,'OBS'::cv('ActClass'))
FROM examplecv;
       displayname        | codesystemname | code  |      codesystem       | implies
--------------------------+----------------+-------+-----------------------+---------
 genomic observation      | ActClass       | GEN   | 2.16.840.1.113883.5.6 | true
 observation              | ActClass       | OBS   | 2.16.840.1.113883.5.6 | true
 care provision           | ActClass       | PCPR  | 2.16.840.1.113883.5.6 | false
 substance administration | ActClass       | SBADM | 2.16.840.1.113883.5.6 | false
(4 rows)

For codesystems with an IsA hierarchy, it is possible to view the ancestor hierachy with the function ancestors. This function is provided for browse functionality only, and is not optimized to be used in other queries.

3.2.3.2.3. Show the ancestor hierarchy of a coded value
select ancestors('409586006'::cv('SNOMED-CT'));
                        ancestors
----------------------------------------------------------
 SNOMED CT Concept (138875005)
  Clinical finding (404684003)
   Clinical history and observation findings (250171008)
    General finding of observation of patient (118222006)
     General problem AND/OR complaint (105721009)
      Complaint (409586006)
(6 rows)

For codesystems with a polyhierarchy, where concepts can have more than one parent, the ancestor function will show a best effort in displaying the hierarchy, as is shown in the example below. In this example, some of the displaynames are indented more than one position compared to the preceding line. This non-uniform indentation is an indication of the existence of more than one parent, as the number of spaces for each code is the equal to the number of ancestors. All concepts shown are ancestors of the input code, ‘fracture of femur’ in the example below. Since the indenting does not show a straight line, it is clear that one or more concepts in the ancestor hierarchy have more than one parent. The polyhierarchy is branched off under ‘clinical finding’ from ‘finding by site’ and ‘disease’.

select ancestors('71620000'::cv('SNOMED-CT'));
                              ancestors
---------------------------------------------------------------------
 SNOMED CT Concept (138875005)
  Clinical finding (404684003)
   Finding by site (118234003)
    Finding of body region (301857004)
     Finding of limb structure (302293008)
   Disease (64572001)
     Disorder by body site (123946008)
      Disorder of body system (362965005)
        Disorder of extremity (128605003)
      Disorder of connective tissue (105969002)
    Musculoskeletal finding (106028002)
        Disorder of musculoskeletal system (928000)
     Bone finding (118953000)
          Musculoskeletal and connective tissue disorder (312225001)
         Disorder of skeletal system (88230002)
             Disorder of bone (76069003)
    Traumatic AND/OR non-traumatic injury (417163006)
     Traumatic injury (417746004)
      Finding of lower limb (116312005)
       Injury of anatomical site (123397009)
            Injury of musculoskeletal system (105606008)
          Disorder of lower extremity (118937003)
       Finding of thigh (419003001)
          Injury of connective tissue (385424001)
              Injury of lower extremity (127279002)
            Disorder of thigh (128135006)
                 Injury of thigh (7523003)
                   Bone injury (284003005)
                    Fracture of bone (125605004)
                           Fracture of lower limb (46866001)
                               Fracture of femur (71620000)
(31 rows)
3.2.3.2.4. Show the descendants of a coded value

Use the descendants function to view the subtree under the given coded value. Note that for a large codesystem as SNOMED-CT, when a general code such as clinical finding is queried, the function will take a lot of memory and time to complete. Non-uniform indentation indicates the presence of concepts with more than one parent.

lake=# select descendants('ORG'::cv('EntityClass'));
        descendants
---------------------------
 organization (ORG)
  public institution (PUB)
  state (STATE)
   Nation (NAT)
(4 rows)

lake=# select descendants('54441004'::cv('SNOMED-CT'));
                     descendants
------------------------------------------------------
 Fracture of shaft of femur (54441004)
        Closed fracture of shaft of femur (26442006)
            Open fracture of shaft of femur (6628008)
(3 rows)

3.3. Importing xml messages

This section covers import and export of information to and from a RIM database. As working with the HL7v3 requires an exact match with the datatype specification and message type at hand, it is recommended to use messages as an interface between the MGRID HDM RIM and an application using it, instead of manually writing software that populates and queries a RIM database. By adopting the method of generating message parsers and exporters from models, the system architecture benefits from a loosly coupled architecture.

3.3.1. Create a message parser

How to create a converter that takes a XML message and transforms it into SQL is described in the documentation of the MGRID Messaging SDK, available from the documentation page on www.mgrid.net.

3.3.2. Using a message parser to populate a RIM database

The message parsers that are generated with the MGRID messaging suite, take a HL7v3 XML file and translate that to a set of SQL statements bundled in an anonymous block, or a JSON document. The translator can be used as a part of a bigger application, but also run from the command line, as is shown below.

3.3.2.1. Load a CDA R2 xml instance in a RIM database, shredded

Convert a message to a anonymous block.

$ cd mgrid-messaging/cda_r2
$ python convert_CDA_R2.py tests/source/mgrid.1/patientID-PS.1.2.3.xml

This will show output like the following.

DO
$$DECLARE

document0 bigint;
organization1 bigint;
person2 bigint;

/* ... (rows omitted here) */

participation147 := Participation_insert(_mif := 'POCD_MT000040', _clonename := 'Performer1', "act" := act65, "functionCode" ...

participation148 := Participation_insert(_mif := 'POCD_MT000040', _clonename := 'RecordTarget', "act" := document0, ...

END$$;

To load this anonymous block in the database, pipe the output to psql:

$ cd mgrid-messaging/cda_r2
$ python convert_CDA_R2.py tests/source/mgrid.1/patientID-PS.1.2.3.xml | psql <database>

3.4. Database configuration options with HDL and HDM

The administration section Creating a database describes a basic example database, which is only one of several different kinds of deployments. This section describes more ways in which the HDL and HDM postgres extensions can be used. Please refer to Creating a database for specific instructions on how to install the software on your system with RPM packages, and how to subsequently execute database commands to load extensions in a database.

3.4.1. HDL only

Use this setup if you require certain datatypes, such as Physical Quantity (pq) or Point in Time (ts), but not the full Reference Information Model. The HDL datatypes are not restricted to only RIM tables; they can be used in any database table. An example usecase is collecting loading scientific data with values and units into a table with physical quantities. This kind of setup is used throughout this documentation, where features of individual data types are illustrated.

Extensions required are:

  • hl7basetable
  • ucum
  • hl7

3.4.2. HDL with vocabulary

Use this setup if you want to perform operations involving codesystem, or vocabulary checks or lookup with the Coded Value (cv) datatype. This kind of setup is used in this documentation for examples of the Coded Value (cv) datatype. Additional extensions required are a specific vocabulary for your database, for instance:

  • snomedctvocab_20140131
  • loinc_250

3.4.3. HDM with RIM, table-per-RIM class

This setup is an extension of the previous setup. Use this setup if you want to load HL7v3 messages, such as CDA R2 XML, into a database. The XML document must be parsed and transformed to SQL with the MGRID Messaging SDK, as described in section Translation to SQL.

Additional extensions required are:

  • the HL7v3 vocabulary matching the normative edition, e.g. hl7v3vocab_edition2011
  • the HL7v3 complex types R1 and R2: hl7v3datatypes
  • a HL7v3 RIM, e.g. hl7v3rim_edition2011
  • the CRUD functions that match the normative edition, e.g. hl7v3crud_edition2011

3.4.4. HDM with locally customized RIM, table-per-RIM class

This setup is an alternative of the previous setup. The hl7v3rim_edition<x> extensions that are provided by MGRID are translations straight from the MIF definitions. Specific applications or database optimizations (indexes) can require additional columns in the RIM tables. In cases like this, do not load the RIM extension with the CREATE EXTENSION command, but make a copy of the extension SQL script, and edit that for your specific needs. The SQL script can be found in the PostgreSQL share directory, such as /usr/pgsql-9.4/share/extension/hl7v3rim_edition2011--2.0.sql. This is the method used for the pond databases that are used by XFM as temporary databases to collect transformed XML messages in, which are subsequently dumped in microbatches to the final lake database.

3.4.5. HDL with JSONB ‘document store’

Since version 9.4 PostgreSQL has excellent support to store JSON objects in a binary, indexable form, with the JSONB datatype. This creates the possibility of an additional method to persist HL7v3 messages, namely as complete JSON objects. Since version 3, the MGRID Messaging SDK has the option to translate to JSON. This is described in section Translation to JSON, which also describes an example program that bulk loads converted messages with the COPY command to a PostgreSQL database. The JSON object that is created with the Messaging SDK, contains datatypes snippets that are compatible with the HDL JSON complex types, and can thus be cast from jsonb to ANY, and further to simple types, to enable computations, comparisons and conversions.

Extensions required for a JSONB document store, that can also transform individual datavalues to healthcare datatypes to perform calculations or selections, are the following:

  1. hl7basetable
  2. ucum
  3. hl7
  4. the HL7v3 vocabulary matching the normative edition, e.g. hl7v3vocab_edition2011
  5. the HL7v3 complex types R1 and R2: hl7v3datatypes
  6. optional but useful: the jsquery extension jsquery

3.4.6. HDM NoSQL dual head setup

This configuration is a combination of the ‘HDM with locally customized RIM, table-per-RIM class’ and ‘HDL with JSONB Document Store’ databases. The dual head setup combines the best of both worlds in a single database. The benefits of each method are summarized in the table below:

HDM locally customized table-per-RIM class JSONB document store
Easy to get data from a single RIM class Extremely scalable, to dozens of TB
RIM database model is like a star schema Does not require context conduction

For use cases up to 200GB, the table-per-RIM class method of persisting healthcare data is sufficient. Beyond 200GB, the post-processing on the RIM tables, such as context conduction and some form of entity resolution will get slower, and not keep up with the speed of the other data loading steps. For use cases where many TBs of healthcare data must be stored, the JSONB document store is better suited, since this method has almost a constant processing (converting to JSONB) and data loading time.

The HDM NoSQL dual head setup was created to support both ways of persistence in a single database. The idea is to configure XFM in such a way that:

  • messages containing dimensional or entity like data are routed to the table-per-RIM class model
  • data like messages that contain factual data are routed to the JSONB document store

The table-per-RIM class has as big benefit that ‘dimensional’, or ‘entity’ like data is readily available for querying through the tables that are childs of the Entity table hierarchy: it is very convenient to have Patient, Organization and Person tables available for query. Since the amount of dimensional data is much less than the factual data in all big databases, if only entity like data such as Patient, Organizational or Practitioner updates is stored in the table-per-RIM class table, the amount if data will be small enough to not suffer from decreased data loading performance, even on multi TB databases.

The PostgreSQL JSONB document store has as big advantage that it is a very flexible, and storage resource friendly way to persist JSON structured data. Benchmarking has shown that most persisted HL7v3 documents are stored in compressed form. Typical physical storage requirement for the JSONB storage is 25% to 33% of the original data size. The success of using the JSONB document store depends on the ease with which we can query for instance observational data from the documents.

3.4.7. HDM with OMOP CDM

OMOP Common Data Model is a datamodel that allows for the systematic analysis of disparate observational databases. To enable this the disparate databases are translated to the OMOP common model, and can then be analysed together using systemic analyses via a library of standard analytic routines that work on the OMOP common model.

The efforts around OMOP are coordinated by OHDSI. OHDSI provide

  1. Vocabularies
  2. Example ETLs from source systems towards the OMOP CDM
  3. Example (research grade) libraries of analyses on data in OMOP CDM

The OMOP CDM v5 is available as a MGRID HDM extension. This extension defines the OMOP CDM database tables, and allows a MGRID HDM installation to be used as a PostgreSQL OMOP CDM repository. Note that while OHDSI vocabularies, example ETL and analytics can be used on top of the standard OMOP CDM tables defined by this extension, they are not supplied with this extension.

The extension does not interact with other HDM extensions. It can be installed standalone, or on the same database where other HDM extensions are present.

Some example queries on a HDM omap schema loaded with the default OHDSI Athena vocabularies follow.

  1. Find a concept given a concept id

    SET search_path=omop_cdm_5_0,public;
    SELECT  C.concept_id,
            C.concept_name,
            C.concept_code,
            C.concept_class_id,
            C.vocabulary_id,
            V.vocabulary_name
    FROM    concept    C,
            vocabulary V
    WHERE   C.concept_id = 44820735
    AND     C.vocabulary_id = V.vocabulary_id;
    
     concept_id | concept_name | concept_code |  concept_class_id  | vocabulary_id |                                            vocabulary_name
    ------------+--------------+--------------+--------------------+---------------+--------------------------------------------------------------------------------------------------------
       44820735 | Tics         | 307.2        | 4-dig nonbill code | ICD9CM        | International Classification of Diseases, Ninth Revision, Clinical Modification, Volume 1 and 2 (NCHS)
    (1 row)
    
  2. Find the ingredients of a drug:

    SELECT D.Concept_Id              drug_concept_id,
           D.Concept_Name            drug_name,
           D.Concept_Code            drug_concept_code,
           D.Concept_Class_id        drug_concept_class,
           A.Concept_Id              ingredient_concept_id,
           A.Concept_Name            ingredient_name,
           A.Concept_Code            ingredient_concept_code,
           A.Concept_Class_id        ingredient_concept_class
    FROM   concept_ancestor  CA,
           concept           A,
           concept           D
    WHERE  CA.descendant_concept_id IN (939355, 19102189, 19033566)
    AND    CA.descendant_concept_id = D.concept_id
    AND    CA.ancestor_concept_id   = A.concept_id
    AND    LOWER(A.concept_class_id)   = 'ingredient';
     drug_concept_id |                  drug_name                   | drug_concept_code | drug_concept_class | ingredient_concept_id | ingredient_name | ingredient_concept_code | ingredient_concept_class
    -----------------+----------------------------------------------+-------------------+--------------------+-----------------------+-----------------+-------------------------+--------------------------
            19102189 | Omeprazole 20 MG Delayed Release Oral Tablet | 402014            | Clinical Drug      |                923645 | Omeprazole      | 7646                    | Ingredient
              939355 | Budesonide 0.05 MG/ACTUAT Inhalant Solution  | 249853            | Clinical Drug      |                939259 | Budesonide      | 19831                   | Ingredient
            19033566 | Lorazepam 0.5 MG Oral Tablet [Ativan]        | 206821            | Branded Drug       |                791967 | Lorazepam       | 6470                    | Ingredient
    (3 rows)
    

3.5. Querying RIM databases

The MGRID HDM is a database implementation of the HL7v3 RIM. With the HDL and HDM, it is possible to query HL7v3 information in a relational database with SQL. This section covers topics that are of importance when querying the RIM.

3.5.1. Safe querying a RIM database

If a database user queries the Observation table, without taking into account the attributes that carry meta information, such as the negation indicators, nullFlavors or uncertainties, information can be misinterpreted. For instance, an Observation with a code for headache and value negation indicator true, carries the meaning that the patient had no headache. But when queried without the negation indicator in the WHERE clause or the SELECT list, the meaning is exactly the opposite. Clearly this is an error. A query that ignores the negation indicator is correct only, when there are no negated acts in the database. As this cannot be assumed for all RIM databases, such a query is considered ‘unsafe’. The topic how to query individual components from the RIM, without losing context or meaning, is called ‘Safe querying of the RIM’.

The following table lists possible places where necessary context can be found, that cannot be omitted without changing meaning. Queries on the RIM that do not take into account all of these attributes will misinterpret information, if there is information in the RIM database where a non-queried attribute is set to a non-default value.

3.5.1.1. Context Attributes

Attribute
Act.nullFlavor
Act.moodCode
Act.actionNegationInd
Act.negationInd
Act.statusCode
Act.uncertaintyCode
Observation.valueNegationInd
ActRelationship.inversionInd
ActRelationship.negationInd
ActRelationship.uncertaintyCode
Participation.negationInd

Table: Context Attributes

3.5.1.2. Context conduction

Note that Context Attributes did not list context conduction codes. Context conduction takes care of propagating and overriding participations through relationships of acts. With active context conduction in the database, there is no need to traverse the RIM graph to find, for instance, a record target participation. Instead this record target participation, if specified with a conducting code, will be associated with each act in the context, as follows from the conducting act relationships.

3.5.1.3. Querying with templates

Another way to limit the effect of misinterpretation is by querying using template ids. If for instance a query selects only data from the RIM from C-CDA templates, it is known a priori that no Act will have a non default value for Act.uncertaintyCode, so that attribute can be ignored.