A Question of Time
The ability to understand time and date is important for many applications, but it is especially important for medical applications. Medical professionals depend on knowledge of the correct temporal order of patient encounters and clinical observations to make treatment decisions. Clinical research is based upon observations and treatments, and temporal ordering of these events is essential to drawing conclusions from this information. Though generic database servers support times and dates, we describe a use case that is hard to solve in a generic database but easy to implement in MGRID.
Consider the following use case: For a research query, we would like to determine the time intervals during which patients did not receive medication. The list of patients and intervals at which they had been administered the drug are provided.
We would like to demonstrate our solution to this use case by using our HL7v3-based temporal datatypes [1,2] that are quick to write, easy to understand and perform well even when datasets increase.
MGRID Healthcare Datatypes
Time information can be (roughly) grouped into three categories: timestamps, intervals and collections of intervals. Note that we need collections of intervals when combining intervals that do not overlap. These three categories have the following HL7v3 namesakes[4]:
- TS (PointInTime), defined as a quantity on the axis of natural time. The literal form specifies date, time, time zone and a precision. Examples: "2004" for the start of the year 2004, and "20010203040506.07" for Feb. 3, 2001 04:05 a.m. and 06.07 seconds.
- IVL_TS (Interval of PointInTime) defined as a set of consecutive timestamps. Example: May 12, 1987 from 8 to 9:30 p.m. is "[198705122000;198705122130]".
- QSET_TS (Continuous Set of PointInTime) defined as a set built from time intervals. Example: “[2004;2005[;[2006;2007[” is the union of the whole of 2004 and the whole of 2006.
These datatypes allow for concise and indexable temporal expressions:
SELECT * FROM observation WHERE effectiveTime ~ '198705122000';
SELECT * FROM observation WHERE effectiveTime ~ '[198705122000;198705122130]';
The first query returns all observations with clinically or operationally relevant time intervals that contain the timestamp May 12, 1987 8:00 p.m. The second returns all observations with a time interval that contains the time interval May 12, 1987 from 8 to 9:30 p.m.
The MGRID Healthcare Datatypes have a number of functions and operators that allow direct manipulation of temporal data and make clinical data integration on intervals easier. For brevity, we only show the interval operators that are needed to solve the use case. The following example shows an addition and subtraction of time intervals, both of which return QSET_TS; a set of time intervals:
SELECT ivl_ts '[2004;2005]' + ivl_ts '[2006;2007]' AS plus,
ivl_ts '[2002;2010]' - ivl_ts '[2004;2005]' AS minus;
plus | minus
-------------------------+-------------------------
[2004;2005];[2006;2007] | [2002;2004[;]2005;2010]
(1 row)
Solving the use case
We have an application-specific table in the database containing patients identified by name and the intervals during which they had medication administered to them:
CREATE TABLE medication (name text, effectivetime ivl_ts);
INSERT INTO medication VALUES ('Pete', '[20100316;20100514]');
INSERT INTO medication VALUES ('Pete', '[20100420;20100701]');
INSERT INTO medication VALUES ('Pete', '[20101220;20110119]');
INSERT INTO medication VALUES ('John', '[20100516;20100614]');
INSERT INTO medication VALUES ('John', '[20100620;20100801]');
INSERT INTO medication VALUES ('John', '[20101220;20110119]');
Answering the question, "At what intervals were our patients not medicated in the year 2010?" then translates into the following SQL:
SELECT name,
'2010' - SUM(effectivetime) AS nomeds
FROM medication
GROUP BY name;
name | nomeds
------+-------------------------------------------------------------
Pete | [20100101;20100316[;]20100701;20101220[
John | [20100101;20100516[;]20100614;20100620[;]20100801;20101220[
(2 rows)
Note that the SUM aggregate uses the previously shown plus operator. The result is then a QSET_TS, consisting of multiple intervals of time that are separated by the QSET union operator ';'. The first rule should be read as "Pete did not receive medication from January 1 until March 16 + from July 1 until December 20."
Without the MGRID Healthcare Datatypes, the medication table must use either the standard date or timestamp datatype. Solving the use case with only those datatypes requires that the software developer implement time interval methods like overlap, union and differences in SQL or software in the application layer (if unable to do so in SQL). Though not impossible to implement, the conventional approach to this use case takes longer to write, understand and maintain.
Although comparing time and date is supported by most programming languages, our HL7v3 standard temporal datatypes make concise temporal calculations and queries possible. Implementing these datatypes inside the database yields one more advantage—the database server can use indexes on time intervals to speed up queries that filter or order on time. In other words, almost all queries.
[1] Havinga, Dijkstra, de Keijzer. Adding HL7 version 3 data types to PostgreSQL, Mar. 2010
[2] HL7v3 Data Types: Abstract Specification, Release 2, May 2011 ballot.
