Implementing Secondary Use of Healthcare Data Through High-Quality Primary Use: A Primary-Care-Pathway First Strategy
Abstract
Secondary use of healthcare data for research, population health management, quality improvement, policy, and artificial intelligence cannot be implemented successfully as a purely technical data-platform initiative. Its reliability depends on the quality, provenance, semantic consistency, and clinical relevance of data generated during care. The central argument of this essay is that healthcare organizations should implement secondary data use through a “primary-use first” strategy: data must first help clinicians, nurses, patients, and multidisciplinary teams make better decisions in real care pathways. Only then can the same data become a trustworthy foundation for research, management, innovation, and AI. This requires clinical governance, workflow redesign, semantic standardization, continuous data-quality measurement, trustworthy access governance, and feedback loops that return value to care delivery. The learning health system provides the most appropriate conceptual model because it treats data capture, analysis, improvement, and clinical practice as one continuous cycle rather than as separate operational and research activities.
Keywords: secondary use of health data; primary use; learning health system; electronic health records; data quality; clinical workflows; FAIR data; health data governance; artificial intelligence.
1. Introduction
Healthcare organizations increasingly seek to reuse routinely collected clinical data for research, quality improvement, policy development, population health management, operational planning, and artificial intelligence. This ambition is legitimate: electronic health records (EHRs), registries, administrative systems, laboratory systems, imaging archives, and patient-reported data can create large-scale longitudinal evidence that conventional trials or manual audits cannot easily provide. Reviews of clinical data reuse emphasize that EHR data can support multiple secondary purposes, including clinical research, surveillance, quality improvement, and decision support, but also show that such reuse is constrained by data quality, heterogeneity, missingness, and unclear provenance.
The critical implementation mistake is to treat secondary use as an independent objective. In reality, secondary use is downstream from primary use. Data recorded in clinical systems are not created under laboratory conditions; they are produced within time-pressured care pathways, by clinicians and nurses whose primary responsibility is diagnosis, treatment, monitoring, communication, coordination, and patient safety. The meaning of a data element depends on who recorded it, why it was recorded, in which workflow, under what constraints, and with which clinical intention. Johnson and colleagues’ work on EHR data provenance is important precisely because it shows that understanding local workflow and data-generating processes is essential for interpreting EHR data used in research.
Therefore, the foundational implementation principle is this: healthcare data should be optimized first for clinical usefulness, safety, and workflow integration; secondary use should then be built as a governed extension of that clinically meaningful data ecosystem. This aligns with the learning health system model, in which data generated during care are continuously transformed into knowledge and then returned to practice through improvement, decision support, and system learning. The Institute of Medicine described this vision as one in which best care practices and real-time knowledge generation are integrated through digital infrastructure, while also warning that governance, (international) standards, workflow, usability, privacy, and burden must be addressed together.
2. Conceptual foundation: secondary use as a by-product of effective primary use
A primary-use first strategy begins by distinguishing between data capture and data use. Capturing data is not the same as creating usable knowledge. A hospital can possess millions of records while still lacking reliable evidence if those records are incomplete, inconsistent, unstandardized, or disconnected from the clinical context in which they were created. Weiskopf and Weng’s widely cited review identified key EHR data-quality dimensions for reuse, including completeness, correctness, concordance, plausibility, and currency. Kahn and colleagues subsequently harmonized data-quality terminology for secondary use into major categories such as conformance, completeness, and plausibility.
These frameworks imply that data quality is not an abstract property of a database; it is “fitness for daily clinical use.” A medication list may be good enough for billing, insufficient for medication reconciliation, and dangerous for AI-based prescribing support if discontinued medicines are not consistently represented. A diagnosis code may be adequate for reimbursement but too imprecise for phenotype discovery. A nursing assessment may be clinically rich in free text but unusable for population dashboards unless structured concepts and metadata are available. For this reason, secondary-use projects should not begin by asking, “What data do we have?” They should ask, “Which clinical decisions, care processes, patient outcomes, and learning questions should the data support, and what primary-use workflows must generate those data reliably?”
The learning health system provides the most coherent answer. It rejects the separation between care and learning. The Institute of Medicine’s digital infrastructure work framed clinical data as a core resource for continuous improvement and emphasized “collect once, use for multiple purposes,” while also noting the need to limit data-collection burden to what is important for patient care and knowledge generation. In this model, secondary use is not an extraction layer imposed on clinicians; it is the consequence of well-designed care processes that produce computable, meaningful, and trustworthy data.
Note: Fixing clinical data early is the most cost-effective and reliable method to protect patients from medical errors, representing a "shift left" strategy in healthcare. When data is verified at the point of entry, errors are caught before they lead to incorrect diagnoses, faulty medication alerts, or improper treatments.
Note: In software, "elephant paths" (also known as "desire paths") refer to the unofficial workarounds or shortcuts users and developers create to bypass intended product designs. They reveal friction in your software and indicate where user behavior contradicts the intended experience.
3. Governance: make data a clinical asset before it becomes a research asset
Successful implementation begins with governance that gives clinicians, nurses, patients, informaticians, data stewards, privacy officers, researchers, and executives shared accountability. Governance must not be limited to approving data-access requests. It must define what data are important, how they are captured, how quality is measured, how errors are corrected, how access is granted, how patients are informed, and how insights return to practice. Governance frameworks for secondary use of routine health data emphasize stakeholder involvement, transparent decision-making, and clear responsibilities across practice, policy, and research.
A primary-use first organization should therefore create competent clinical data governance councils organized around care pathways, such as stroke, oncology, heart failure, maternity care, chronic kidney disease, emergency care, or medication safety. These councils should be chaired or co-chaired by clinical leaders and include nursing leadership, allied health professionals, operational managers, patient representatives, data architects, terminology specialists, legal/privacy experts, and researchers. Their mandate should include approving clinical data definitions, reducing unnecessary documentation, prioritizing structured data capture where it improves care, and overseeing secondary-use readiness.
The European Health Data Space (EHDS) illustrates the policy direction in Europe: it explicitly links primary use, secondary use, EHR-system interoperability, secure reuse, patient rights, opt-out mechanisms, health data access bodies, secure processing environments, and quality requirements. The European Commission states that the EHDS is intended to support both access/control for individuals in healthcare delivery and secure reuse for research, innovation, policy-making, and regulatory activities. This reinforces the organizational lesson: secondary use requires a trustworthy governance architecture, not only a data warehouse.
4. Redesign clinical documentation around care pathways
The most important operational intervention is to redesign documentation so that it supports care before it serves reporting. Clinicians and nurses often experience EHRs as burdensome because documentation is fragmented across billing, legal, regulatory, quality, and clinical requirements. Studies and reviews link EHR burden to documentation workload, usability problems, workflow disruption, stress, and burnout among clinicians and nurses.
A primary-use first implementation should therefore start with care-pathway mapping. For each pathway, the organization should identify the core clinical decisions, handovers, safety risks, patient outcomes, and coordination points. Data requirements should then be derived from those clinical needs. For example, in a heart-failure pathway, structured capture of ejection fraction, New York Heart Association (NYHA) class, medications, renal function, weight trends, decompensation episodes, patient education, and follow-up plans is valuable because these data support clinical decisions and care coordination. Their secondary value for research and quality measurement follows from their primary value.
The guiding rule should be: do not ask clinicians to enter data solely for distant secondary purposes unless the organization can demonstrate either immediate clinical usefulness or a justified public-interest requirement with minimal burden. Where secondary-use requirements exist, they should be embedded into clinical workflow through automation, defaults, device integration, interoperability, natural language processing with validation, and team-based documentation rather than additional manual forms. Documentation should be designed around “clinician cognition” and patient-care flow, not around database tables.
5. Standardize clinical content and preserve clinical meaning
Secondary use fails when identical clinical concepts are recorded differently across departments, sites, professions, and systems. A blood pressure may be recorded as structured vital-sign data, free text, device output, nursing observation, or PDF attachment. A diagnosis may be represented as WHO ICD, SNOMED CT, local code, billing category, or narrative impression. A procedure may be captured differently by operating theatres, billing systems (WHO ICHI), registries, and discharge summaries. Without semantic governance, data lakes become “data swamps.”
Implementation requires a controlled clinical information model. This includes (international) standard terminologies, common data definitions, metadata, value sets, and binding rules. International interoperability efforts such as FHIR, core datasets, and certification-oriented data classes reflect the same principle: standardized data classes and interoperability specifications are necessary for secure exchange and reuse. ONC (Office of the National Coordinator for Health IT) describes USCDI (United States Core Data for Interoperability) as a standardized set of health data classes and elements for interoperable exchange, and its certification work emphasizes functionality, security, and interoperability.
However, standards alone are insufficient. Healthcare organizations must also preserve the clinical context of data. This means recording provenance metadata: source system, author role, timestamp, encounter type, care setting, device origin, method of measurement, and whether data were patient-reported, clinician-entered, imported, inferred, or algorithmically generated. EHR provenance research shows that secondary users need to understand how local workflow shapes data values before drawing conclusions from them.
Note: openEHR (pronounced "open-air") is an open, vendor-neutral technology and standard specification for modeling, storing, and managing health data within electronic health records (EHRs).
Note: The building blocks of healthcare data transfer are the core foundational components, technical profiles, and regulatory frameworks required to securely and seamlessly move medical information across separate IT ecosystems. In modern digital frameworks - such as the European Health Data Space (EHDS) initiative and international clinical informatics models - these blocks ensure that transferred data remains intact, standardized, and legally secure
6. Build data quality into the care process, not after extraction
Many organizations discover data-quality problems only after extraction, when analysts find missing values, impossible dates, inconsistent units, or unexplainable variation across sites. At that point, correction is expensive and often impossible. A primary-use first strategy builds quality assurance into the clinical data lifecycle.
This requires three layers of data-quality control.
- First, front-end quality should be improved through usable forms, structured fields where appropriate, value-range checks, unit standardization, mandatory fields only when clinically justified, and immediate feedback to the user.
- Second, workflow quality should be monitored by measuring whether documentation occurs at the correct point in care, whether handovers use the intended data, and whether clinicians trust the information.
- Third, secondary-use quality should be assessed through systematic validation using accepted dimensions such as completeness, correctness, concordance, plausibility, currency, conformance, and validity.
Data-quality dashboards should be returned to clinical teams, not hidden in analytics departments. If a diabetes mellitus (DM) registry has missing HbA1c values, the responsible care team should know whether the issue reflects clinical omission, laboratory-interface failure, coding variation, delayed documentation, or inappropriate cohort logic. Feedback should be non-punitive and improvement-oriented. The purpose is to make better care easier, not to blame professionals for defects created by poor systems.
7. Implement FAIR and interoperable infrastructure without losing clinical accountability
The FAIR principles - Findable, Accessible, Interoperable, and Reusable - are important for secondary use because they emphasize machine-actionable metadata, standardization, and reusable digital assets. Wilkinson and colleagues’ FAIR principles were designed to improve the findability, accessibility, interoperability, and reuse of data and related workflows.
In healthcare, FAIR implementation should not mean uncontrolled openness. Health data are sensitive, relational, and potentially re-identifiable. FAIR must therefore be implemented through privacy-preserving catalogues, role-based access, data-use agreements, secure processing environments, audit trails, pseudonymization or anonymization where appropriate, and transparent governance. The EHDS similarly combines data findability and reuse with health data access bodies, authorization, minimization, secure processing environments, and opt-out rights.
The infrastructure should include a governed data catalogue, metadata repository, terminology server, master patient index, consent/authorization service, data-quality service, secure analytics environment, and reproducible pipelines. But the infrastructure must remain connected to clinical ownership. A technically elegant data platform will still fail if clinical teams do not trust the source data or if data stewards cannot explain why values appear as they do.
8. Create competetent, trustworthy legal, ethical, and social governance
Secondary use depends on trust. Patients must trust that their data will be used for legitimate purposes, protected from misuse, and governed transparently. Clinicians must trust that secondary-use outputs will not misrepresent their work or create unfair performance judgments. Researchers and policymakers must trust that datasets are valid for the questions asked. Literature on trust-based governance of health data research emphasizes that responsible data reuse requires more than legal compliance; it requires trustworthy institutions, transparent procedures, and public confidence.
A healthcare organization should therefore publish clear information about which data are reused, for which purposes, under which safeguards, with which oversight, and how patients can exercise rights where applicable. It should distinguish between internal quality improvement, public-health reporting, scientific research, commercial innovation, and AI development, because these uses have different ethical expectations and risk profiles. The governance model should also include appropriate sanctions for misuse, independent review for high-risk projects, public and patient involvement, and transparent reporting of approved secondary-use projects.
For European organizations, implementation must be aligned with GDPR and the phased EHDS environment. The EHDS entered into force in March 2025 and will apply progressively, with major primary-use and secondary-use provisions applying later according to data category and implementation phase. This means organizations should prepare now by improving EHR interoperability, data-quality documentation, access-governance procedures, patient-transparency mechanisms, and secure processing capacity.
9. Use AI only after establishing data provenance, quality, and clinical validation
Artificial intelligence (AI) intensifies the importance of primary-use quality. AI systems trained on biased, incomplete, poorly labeled, or workflow-distorted data can reproduce inequities, generate unsafe predictions, and lose clinician trust. WHO’s ethics and governance guidance for AI in health identifies ethical risks and proposes principles to ensure that AI works for public benefit, while good machine learning practice guidance emphasizes safe, effective, and high-quality AI/ML development across the product lifecycle.
Therefore, organizations should not begin with “AI use cases” in isolation. They should begin with clinically important problems where high-quality primary data already exist or can be improved. AI governance should require documented provenance, data-quality assessment, bias analysis, external validation, clinical workflow evaluation, human oversight, monitoring after deployment, and clear accountability. Reporting frameworks such as TRIPOD+AI demonstrate the broader movement toward transparent reporting of prediction models using regression or machine-learning methods.
Most importantly, AI outputs must return value to care delivery (efficiency, effectiveness, outcome). If AI increases alerts, documentation, or cognitive burden without improving decisions, it will damage both primary use and secondary trust. AI should be implemented as part of a learning system: data generate models, models support decisions, decisions are monitored, outcomes are evaluated, and feedback improves both the model and the underlying care process.
Note: TRIPOD+AI is an essential 27-item reporting checklist designed to ensure the transparency, accuracy, and reproducibility of clinical prediction models that utilize artificial intelligence (AI) and machine learning (ML). It is the updated successor to the original 2015 TRIPOD statement
10. A practical implementation roadmap
A primary-use first secondary-data strategy can be implemented in five phases.
Phase 1: Establish clinical-data governance and select priority pathways. The organization should choose a small number of high-value pathways where better data will improve patient outcomes, reduce risk, or support strategic objectives. Governance bodies should define clinical ownership, patient involvement, data-steward roles, privacy responsibilities, and escalation mechanisms.
Phase 2: Map workflows and define clinically valuable data. Competent multidisciplinary teams should map how care is delivered, where decisions occur, what information is needed, what data are already captured, where duplication exists, and which documentation creates no clinical value. The output should be a pathway data specification: data elements, definitions, source systems, responsible roles, timing, standards, and intended primary and secondary uses.
Phase 3: Redesign documentation and interoperability. Ergonomic and useable EHR templates, order sets, nursing documentation, device interfaces, patient-reported outcome tools, and handover forms should be redesigned to capture data once and use it many times. This redesign should reduce duplicate entry and align data capture with care delivery. Integration with terminology services and interoperability standards should be implemented at this stage.
Phase 4: Implement data-quality monitoring and feedback. Appropriate data-quality rules should be defined before secondary-use analytics begin. These rules should test completeness, conformance, plausibility, timeliness, consistency, duplication, and provenance. Results should be reported to clinical teams and governance councils, with improvement actions tracked through quality-improvement cycles.
Phase 5: Scale secondary use through secure, FAIR, and governed analytics. Once primary-use data are clinically trusted, the organization can safely expand to research, dashboards, population health, policy reporting, and AI. Data catalogues, secure processing environments, access controls, audit logs, reproducible pipelines, and transparent project registers should support this scale-up.
This roadmap deliberately places analytics after care-pathway optimization. It does not delay secondary use indefinitely; rather, it prevents premature secondary use of unreliable data. In the long term, it is faster to build research and AI capabilities on trustworthy primary data than to repeatedly clean, reinterpret, and defend weak data extracted from poorly designed workflows.
11. Evaluation: what success should look like
Implementation should be evaluated at four levels.
- At the clinical level, success means that clinicians and nurses experience data as useful for decisions, handovers, safety, coordination, and patient outcomes.
- At the data level, success means measurable improvement in completeness, correctness, plausibility, timeliness, conformance, and provenance.
- At the governance level, success means transparent access decisions, patient trust, ethical oversight, and clear accountability.
- At the secondary-use level, success means that research, management dashboards, policy analyses, quality-improvement programs, and AI tools produce valid, explainable, and clinically accepted outputs.
These outcomes should be measured together. A secondary-use program that produces many dashboards but increases documentation burden is not successful. A research data warehouse that supports publications but is distrusted by clinicians is fragile. An AI program that performs well retrospectively but fails in workflow is not a learning health system. The proper test is whether data improve care and learning simultaneously.
Conclusion
Secondary use of healthcare data will succeed only when built on high-quality primary use. The core implementation challenge is not merely technical extraction, storage, or analytics; it is the redesign of clinical data work so that information recorded during care is meaningful, standardized, trusted, and useful at the point of care. Healthcare organizations should therefore begin with clinical pathways, multidisciplinary governance, documentation redesign, semantic standardization, data-quality monitoring, patient trust, and feedback loops. Research, management, policy, quality improvement, and artificial intelligence should be treated as downstream capabilities of a learning health system.
The practical implication is clear: organizations should stop asking clinicians and nurses to document for invisible secondary purposes and instead make data work visibly for patient care. When data help clinicians and patients first, they become more complete, accurate, timely, and meaningful. Secondary use is then not an additional burden imposed on care; it becomes the natural and trustworthy extension of good care.
Bibliography
Bernardi, Filipe Andrade, et al. "Data quality in health research: integrative literature review." Journal of medical Internet research 25 (2023): e41446.
Brown, Jeffrey S., Michael Kahn, and Sengwee Toh. "Data quality assessment for comparative effectiveness research in distributed data networks." Medical care 51 (2013): S22-S29.
Enticott, Joanne, Alison Johnson, and Helena Teede. "Learning health systems using data to drive healthcare improvement and impact: a systematic review." BMC health services research 21.1 (2021): 200.
European Commission. (2025–2026). European Health Data Space Regulation. Directorate-General for Health and Food Safety.
McGinnis, J. Michael, Brian Powers, and Claudia Grossmann, eds. "Digital infrastructure for the learning health system: the foundation for continuous improvement in health and health care: workshop series summary." (2011).
Sanders, Julia, Brian Powers, and Claudia Grossmann, eds. Digital data improvement priorities for continuous learning in health and health care: workshop summary. National Academies Press, 2013.
Johnson, Karin E., et al. "How the provenance of electronic health record data matters for research: a case example using system mapping." EGEMS 2.1 (2014): 1058.
Kahn, Michael G., et al. "A harmonized data quality assessment terminology and framework for the secondary use of electronic health record data." Egems 4.1 (2016): 1244.
Meystre, Stephane M., et al. "Clinical data reuse or secondary use: current status and potential future progress." Yearbook of medical informatics 26.01 (2017): 38-52.
Murad, M. Hassan, et al. "Measuring documentation burden in healthcare." Journal of general internal medicine 39.14 (2024): 2837.
Office of the National Coordinator for Health Information Technology. (2026). Certification of Health IT; interoperability and USCDI resources. U.S. Department of Health and Human Services.
Weiskopf, Nicole Gray, and Chunhua Weng. "Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research." Journal of the American Medical Informatics Association 20.1 (2013): 144-151.
Wilkinson, Mark D., et al. "The FAIR Guiding Principles for scientific data management and stewardship." Scientific data 3.1 (2016): 1-9.
World Health Organization. (2021). Global Strategy on Digital Health 2020–2025. WHO.
World Health Organization. (2021). Ethics and governance of artificial intelligence for health. WHO.
TRIPOD+ AI statement. "Updated guidance for reporting clinical prediction models that use regression or machine learning methods." Bmj 385 (2024): q902. (update: https://www.bmj.com/content/385/bmj.q902)