Updated: Oct 20, 2022
Streamlining data annotation for multi-site research and commercial collaborations with the Rhino Health Platform
At Rhino Health, we are focused on removing obstacles for users that seek to take advantage of health data from multiple sites.
The Problem: Annotation Slows Down Great Machine Learning
Nearly every machine learning (ML) effort using health data requires a ‘human-in-the-loop’ annotation step, where subject matter experts, often clinicians, manually review each case and attach one or more labels to it. The quality of labeling has a tremendous impact on every project, as labels are used at every step from basic data quality control, through curation of datasets for training, testing, and validation, all the way to model training and refinement.
Until now, when users wanted to collaborate on health data from multiple sites, they had to compromise and choose one of a few sub-optimal options when it comes to annotation:
Centralize the data and annotation - Costly and time-consuming, this requires the leading site to manage a data repository for each project.
Rely on local annotators at each site - Introduces biases and leads to inconsistent labels across sites.
Provide VPN access to the data for external annotators - Not secure and jeopardizes patient privacy.
By looking to leverage health data from multiple sites, developers are taking a step in the right direction in order to combat model bias and improve performance and generalizability. Compromising on the critical step of data annotation is highly consequential and often leads to inadequate model performance. Common pitfalls include:
Many sites (e.g., hospitals, clinics) do not have internal capacity to contribute to data annotation.
Levels of expertise can vary widely between annotators, especially across sites.
Guidelines for specific diagnoses may vary across different hospitals and geographies.
Guidelines for pixel-level annotations can vary greatly for staff with different levels of expertise and different specializations (e.g. tumor margins).
Rhino Health’s Solution: The ‘Secure Access’ Feature Family
The Rhino Health Platform enables users to run computation on distributed data sets. With our newly released Secure Access feature, users can now leverage a secure, zero-footprint annotation tool on distributed data - with no patient-level data persisting outside of the local site’s firewall.
The Secure Access feature provides the best of all worlds:
Data is only ever persisted locally behind the data custodians’ firewalls, thus ensuring privacy and security concerns are met.
Annotations are orchestrated via a unified framework, enabling consistent, high-quality labels.
Annotations are stored along with the original data, at the data custodian’s repository.
How Does ‘Secure Access’ Deliver Value To Our Users?
Secure Access is enabled by integrating zero-footprint visualization technology. This technology allows users collaborating on diverse health data to request access to and view specific data points (e.g., tabular data, imaging studies) from collaborating sites without the data being persisted outside of its local environment. The Rhino Health Platform supports interactions across multiple sites with a single framework whereby each site provides a “Secure Access List” of data points, culminating in a unified, site-agnostic list for review, feedback, and annotation.
In practice, this capability allows users to perform tasks such as:
Review data from their own local cohorts within the platform, including via a secure in-platform image viewer.
Add feedback, labels, and annotations to local data cohorts and cohorts of collaborators at remote sites.
Provide collaborators with secure, temporary, and auditable access to specified data points for efficient, consistent feedback and annotation workflows.
Facilitate clinical reads and second-reads from clinicians
Shown: Secure in-platform image viewer with labels
Use-cases We Have Found To Benefit From ‘Secure Access’
In support of both research and commercial customers, the Secure Access feature was developed to streamline the burdensome annotation process for distributed data. This capability enables any user seeking to annotate or interact with health data of any kind at distributed sites. Site-specific annotation lists can be “virtualized” and joined to produce a “site-agnostic annotation list”, reducing the risk of site-specific bias. Use-cases that benefit from this include:
Peer-to-peer and institutional consortia research
Commercial development of AI/ML and analytic applications
Validation and refinement of algorithms on diverse data sets
Deployment of and continuous learning for AI/ML solutions
By allowing users to perform annotation and run computations on distributed data within a unified platform, many more healthcare ecosystem members are able to participate in innovative research and development efforts in order to advance the impact of healthcare technology. Eliminating the need to worry about maintaining complicated infrastructure, coordinating across institutions to annotate their data, and wading through the complexities of data “ownership,” empowers:
Clinical sites with limited or no local data science expertise
Clinical sites with constrained clinicians, unable to allocate precious time to annotation
Professional societies and organizations with multiple member sites
Data remains where it should be - protected behind the hospital firewall. With Rhino Health’s Secure Access feature, annotations can now be securely performed across sites, enabling the creation and curation of the kind of high-quality data sets needed to fuel the development of robust, generalizable AI/ML and analytics solutions.