Semantics, Ontology, and Syntax in Health Informatics: Conceptual Distinctions and Practical Implications for Hospital EHR Implementation
Abstract
Implementing a modern Electronic Health Record (EHR) in a hospital is fundamentally an exercise in representing, exchanging, and reusing clinical information safely across time, teams, and systems. This requires clarity about three related but distinct notions: syntax (how information is structured and encoded), semantics (what that structured information means), and ontology (a formal, explicit specification of the domain concepts and relations that underpin shared meaning and enable computational reasoning). In this essay I want to distinguish these concepts and demonstrates how their differences translate into concrete design, integration, governance, and patient-safety considerations in real-world hospital EHR implementations.
1. Introduction
Hospitals depend on EHRs not only to store a longitudinal patient record, but to coordinate care across departments (e.g., laboratory, pharmacy, radiology, admissions), across professions, and often across organizational boundaries. Achieving this requires more than “digital documents”: it requires that data can be exchanged between systems and used reliably - by humans (physicians, nurses) in a real-life clinical setting and increasingly by software (alerts, order checks, registries, quality reporting, analytics). Interoperability therefore hinges on both the form of information and the meaning attached to that form. Health information standards explicitly separate these concerns: data exchange depends on both syntax (structure) and semantics (meaning).
2. Conceptual Foundations
2.1 Syntax: the grammar of representation
Syntax refers to the formal structure and encoding rules that make data parsable and technically valid. In practice, syntax determines whether a receiving system can correctly read fields, segments, elements, datatypes, and message boundaries.
In healthcare exchange standards, syntax is embodied by message and document formats (e.g., HL7 v2 message structure, or HL7 FHIR serializations such as XML/JSON). HL7 v2, for example, uses message profiles that specify the order and structure of content and the delimiters separating components.
HL7 (Health Level Seven) refers to a set of international standards for exchanging, integrating, sharing, and retrieving electronic health information between different healthcare systems, enabling communication between disparate software applications like EMRs, lab systems, and billing software.
Key point: syntactic correctness does not guarantee clinical correctness. A message can be perfectly well-formed yet still clinically ambiguous or misleading if meaning is not shared.
2.2 Semantics: the meaning of data elements in context
Semantics concerns what the structured symbols mean - clinically and operationally. In interoperability terms, semantic interoperability exists when two systems (clinicians) interpret exchanged data the same way, including units, context, and intent. A widely used healthcare standards overview describes semantic interoperability as ensuring that systems “understand the meaning of data in the same way and use and interpret the data that is exchanged,” typically by relying on a common terminology.
In EHR practice, semantics is carried by:
- Clinical terminologies and code systems (e.g., SNOMED CT for clinical concepts; LOINC for lab test identifiers; UCUM for units), and
- Information models that define what a data element is intended to represent (e.g., an “Observation” vs. an “Order”).
FHIR explicitly separates structure (resources) from the semantic clarification provided by terminology use throughout the standard.
2.3 Ontology: an explicit, formal domain model enabling shared meaning and reasoning
An ontology is commonly defined in computer science as an “explicit specification of a conceptualization,” i.e., a formally defined set of concepts and relations used to model a domain.
What distinguishes ontology from “semantics” in general is formalization: ontologies specify meanings and constraints in a machine-processable way (often with logic-based axioms), enabling consistency checking and automated inference.
In healthcare, ontological approaches appear in:
- Logic-based terminologies (notably SNOMED CT’s logic-based concept definitions and hierarchies), and
- Knowledge representations used for decision support, phenotyping, and semantic integration (e.g., knowledge graphs).
SNOMED CT is widely described as having “formal logic-based definitions organized into hierarchies,” and its maintenance relies on description logics and reasoning services.
Key point: semantics is the goal (shared meaning); ontology is one of the most rigorous tools to achieve shared, computable semantics at scale.
3. Why These Distinctions Matter in a Hospital EHR Implementation
3.1 Integration and interfaces: syntax is necessary, not sufficient
Hospital EHRs must exchange data with many systems: laboratory information systems (LIMS), radiology systems, pharmacy systems (WHO ATC, RxNorm), admission/discharge/transfer (ADT) systems, external registries, and regional/national exchanges. These connections often begin with syntactic integration - ensuring messages conform to expected formats and can be validated.
HL7 v2 illustrates this: message profiles specify structure and delimiters; segments like MSH/PID/OBR/OBX provide predictable syntactic containers for events and results.
A common pattern in an Observation Result (ORU) (result) message is:
- The Message Header (MSH) segment is the required first segment of (non-batch) HL7 v2 messages. It defines the message’s intent, source, destination, and key syntax parameters (notably the separators/encoding characters that determine how the rest of the message is parsed) - who/what/where + parsing rules + message type.
- The Patient Identification (PID) segment carries patient identifying and demographic information and is widely used as the primary way to communicate patient identity between systems. It typically includes identifiers (e.g., MRN), name, date of birth, sex, address, etc. - which patient
- The Observation Request (OBR) segment transmits information specific to an order/request for a diagnostic study or observation (e.g., a lab order, imaging order, assessment). It often provides the “header” context for a set of results - which order/test (context for results).
- The Observation/Result (OBX) segment carries a single observation/result (or a fragment) - often described as the smallest indivisible unit of a report. Multiple OBX segments are commonly sent under a given OBR to report panels or multi-component results - (repeating): the result(s) for that order/test.
However, HL7 v2 implementations can be heavily customized and rely on local agreements, which is feasible within a single organization but becomes increasingly impractical across many partners.
Implementation implication: interface engines and integration teams must treat syntactic conformance as the baseline; clinical safety depends on semantic alignment beyond mere message validity.
3.2 Lab results as a concrete example: semantics prevents clinical misinterpretation
Consider a lab sodium result transmitted from a laboratory system to the EHR. HL7 v2 can convey a syntactically valid OBX segment. Yet the clinical meaning hinges on using a shared identifier for what was measured and how it is expressed.
The NLM’s health data standards tutorial provides an HL7 v2 example where the sodium test is identified using a LOINC code (e.g., 2951-2) and the result includes units (e.g., mmol/L). It also explicitly notes that using unique identifiers from agreed terminology standards reduces ambiguity about what is requested and what is being reported.
Operational consequence in a hospital: without semantic rigor, results may display incorrectly, trend graphs may merge incomparable observations, and decision support thresholds may trigger erroneously.
3.3 FHIR as an illustration of separating syntax, structure, and semantics
FHIR (Fast Healthcare Interoperability Resources) is designed as both a data structure (“Resources”) and a method for sharing that data, supporting multiple representation formats (XML, JSON, RDF/Turtle).
Critically, FHIR’s approach highlights layered concerns:
- Syntax/serialization: JSON vs XML vs RDF encodings;
- Structural model: resources and datatypes;
- Semantics: terminology use throughout FHIR to “clearly define concepts,” including code systems and value sets.
Implementation implication: hospital projects adopting FHIR still need governance for semantic choices (which code systems, which value sets, which profiles) or they risk producing syntactically valid but semantically incompatible FHIR.
3.4 Ontology in practice: beyond coding, toward computable meaning and reasoning
Hospitals increasingly (should) want EHR data to support:
- Medication/allergy checking, problem–medication interactions, and other decision and process support;
- Cohort identification and research phenotyping (RCT, RWE);
- Quality indicators and longitudinal analytics across sites or vendors.
These uses depend on consistent semantics, and ontology becomes valuable when the system must reason over concepts and relationships, not just store codes. SNOMED CT provides a prominent example of ontology-like formalization: description logic is defined as a representation of semantic knowledge enabling formal reasoning based on axioms, and SNOMED CT concept definitions can be represented in OWL (Web Ontology Language) functional syntax for machine interpretation and integrity checking.
OWL stands for the W3C Web Ontology Language. It is a World Wide Web Consortium (W3C) standard language used to write ontologies - that is, formal domain models - so that computers can interpret them with formally defined meaning (i.e., precise semantics). The current standard family is OWL 2. OWL matters when you want computable meaning - e.g., consistent concept classification, subsumption-based queries (“all descendants of ‘diabetes mellitus’”), or robust rule definitions for analytics and decision support that depend on formal concept definitions rather than ad hoc lists.
The SNOMED CT Expression Constraint Language (ECL) is a formal, declarative query language for defining precise, intensional subsets of SNOMED CT clinical concepts by applying constraints and relationships, used to restrict data, define reference sets, and power advanced clinical queries within Electronic Health Records (EHRs) for defining valid values, querying content, and quality assurance. ECL allows complex rules, such as finding all subtypes of a condition (e.g., all fractures) or concepts with specific attribute values (e.g., a drug with a certain strength), providing powerful, executable logic for clinical terminologies.
Hospital relevance: ontology-based relationships enable meaning-based retrieval (e.g., retrieving all patients with a concept and its descendants), improve consistency of terminological updates, and support more robust decision logic than brittle local lists.
4. Governance and Implementation Recommendations Mapped to the Three Concepts
4.1 Governance for syntax
- Establish interface specifications and conformance testing (message validation, schema validation).
- Standardize transport and exchange patterns (e.g., profiles/implementation guides for partners).
4.2 Governance for semantics
- Operate or procure a (qualitative) terminology service to manage code systems, value sets, mappings, and versioning effciently and effectively.
- Specify unit standards and measurement conventions; enforce them at data capture and at ingestion.
- Define clinical data dictionaries so that “the same field” means the same thing across modules and departments.
4.3 Governance for ontology
Where advanced primary and secondary reuse is required (decision support, process support, analytics, cross-site integration, RCT, real world evidence (RWE)), adopt or align with formal ontologies/logic-based terminologies and ensure governance for concept modeling choices.
Plan for reasoning/inference use cases explicitly (e.g., subsumption queries, constraint checking, consistent phenotype definitions), and validate them against clinical expectations.
5. Closing remarks
In hospital EHR work, syntax ensures information can be transmitted and parsed; semantics ensures the parsed information is interpreted consistently and safely; and ontology provides an explicit, formal foundation for shared meaning that supports computation, integrity checking, and reasoning at scale. Treating these as interchangeable leads to a common failure mode: “interoperable” interfaces that pass technical validation while still producing clinically inconsistent or unusable data. Conversely, designing the EHR ecosystem as a layered stack - syntactic conformance, semantic standardization, and ontology-informed computability - directly improves care coordination, decision support reliability, secondary use of data, and long-term maintainability.
References (selection)
Health Information and Quality Authority (HIQA). Overview of Healthcare Interoperability Standards (2013).
U.S. National Library of Medicine (NLM). Health Data Standards and Terminologies: A Tutorial (HL7 v2 and FHIR modules).
HL7 (Health Level Seven). Standards.
HL7. Fast Healthcare Interoperability Resources (FHIR) specification (resources; exchange approaches; datatypes).