Navigating the NIH Data Management Mandate with the Rhino Federated Computing Platform
The Challenge of the NIH Data Management Mandate
The National Institutes of Health (NIH) Data Management and Sharing (DMS) Policy introduced in January 2023 brought new requirements for managing & sharing data in NIH-funded research. This mandate requires researchers to submit detailed plans that outline how scientific data generated from their projects will be managed, preserved, and shared. The DMS Policy aims to promote transparency, enhance reproducibility, and accelerate scientific discovery by encouraging broader data sharing, while also ensuring that privacy, security, and ethical standards are upheld.
The NIH DMS Policy has several noteworthy benefits:
- Broader Access to Data: By encouraging data sharing across institutions, the mandate facilitates greater access to diverse datasets. This democratization of data enables more researchers to participate in studies, fostering a more collaborative research environment and outcomes that are applicable to a broader population.
- Faster Data Discovery: With improved access to shared datasets, researchers can accelerate the process of data discovery and analysis, leading to quicker identification of patterns, trends, and insights that drive scientific progress.
- Enhanced Reproducibility: Open data practices mandated by the NIH ensure that datasets are more transparent and accessible, enabling better verification of research findings. This increases the credibility of scientific work and helps to avoid the pitfalls of irreproducible results.
- Encouragement of Innovative Solutions: The policy pushes the scientific community to develop novel data management strategies, such as the use of Federated Learning and Edge Computing, to overcome the inherent challenges of data sharing while preserving privacy and security.
Despite these benefits, the DMS Policy also presents several challenges, particularly when it comes to managing the complexities of data privacy, security, and governance. Traditional data centralization approaches, where all data is pooled into a single repository, exacerbate these issues:
- Security and Privacy Risks: Centralizing sensitive data such as Omics, Imaging, and Clinical Notes heightens the risk of data breaches, non-compliance, and misuse.
- Governance Issues: Institutions often resist sharing data due to concerns over control and management, especially when required to centralize their data.
- Analysis Readiness: Data in centralized repositories may not always be in the appropriate format or may need extensive transformation, further complicating its use.
- Access and Engagement Barriers: Engaging a wider research community around shared datasets can be challenging, and provisioning common tools for data transformation and analytics adds complexity.
A Streamlined Solution: Rhino Federated Computing Platform
The Rhino Federated Computing Platform (Rhino FCP) provides an end-to-end solution for the DMS Policy. Research teams can ensure their data adhere to one common data model, provide access to a dataset that combines their data while actually keeping it at rest behind their organization’s firewalls, perform analysis on this combined dataset, and then easily provision access to & monitor use of those datasets to the global research community.
Rhino FCP integrates with enterprise IT and existing workflows. Users can securely deploy custom code or software on the data made available to them via Rhino FCP’s Federated Trusted Research Environment (fTRE), providing unparalleled flexibility to facilitate the users’ preferred research workflow. Rhino also makes available containers featuring frequently used software packages such as Python, RStudio, SPSS, and Matlab. Rhino FCP can be installed in all major cloud providers or on-prem, allows for SSO integration, and other enterprise-friendly features.
Rhino FCP offers researchers several options for a Federated Data Management Solution: fully federated with totally decentralized data residing at contributing sites, centralized with secure collaboration managed via Rhino FCP, or a hybrid approach with row-level data remaining at the contributing site but centralized metadata (see figure below).
Figure: Permutations of Data Management Solutions available via Rhino FCP
How Does Rhino FCP Work?
Rhino FCP’s use of Edge Computing and Federated Learning provides a foundation for achieving the DMS Policy's objectives:
- Edge Computing: Edge computing brings computation and data storage closer to the data source, reducing latency and enhancing data privacy. By processing data locally at each participating institution, Rhino FCP minimizes the risks associated with data centralization while improving the speed and efficiency of data analysis.
- Federated Learning: Federated learning allows machine learning models to be trained across multiple decentralized devices or servers without transferring raw data to a central location. Rhino FCP leverages this method to enable collaboration without compromising data security, allowing institutions to retain control over their data while contributing to collective research efforts.
Figure: Rhino Federated Computing Platform DMS Capabilities
By combining Edge Computing and Federated Learning, Rhino FCP enables researchers to collaborate securely and efficiently without the need to centralize sensitive data. Here’s how the platform addresses key challenges while leveraging the benefits of the DMS Policy:
- Enhanced Security and Privacy Controls:
- Rhino FCP ensures that all data remains encrypted at rest, in transit, and during processing. Data never leaves the data custodian’s environment; only aggregate statistics or model weights are shared, preserving confidentiality.
- The platform supports robust governance through role-based access controls, project-specific permissions, and secure model parameters, aligning with the NIH’s requirements for strong data management and governance practices.
- Rhino FCP also features differential privacy and k-anonymization for added measures of privacy protection.
- Rhino FCP also enables teams to whitelist specific code, to ensure that only trusted workflows are being done on their data.
- Rhino is ISO 27001 and SOC 2 certified.
- Improved Analysis Readiness:
- Rhino FCP’s Harmonization Copilot leverages generative AI for efficient data transformation, ensuring that datasets are always analysis-ready. It offers healthcare-specific data harmonization tools and workflows, facilitating ongoing updates directly from data sources, addressing the DMS Policy’s requirement for dynamic and compliant data management.
- Access Management:
- Rhino FCP enables seamless data discovery and feasibility testing across multiple sites using its Federated Datasets and Trusted Research Environment (TRE) applications. This promotes broader access to data while maintaining privacy and security, crucial for meeting NIH goals.
- By providing managed development and execution environments for analytics and AI, the platform ensures compliance while fostering collaboration among a diverse research community.
- Improved Cost Efficiency:
- Rhino FCP reduces compute & storage expenses versus alternatives, as Rhino FCP doesn’t require duplicative copies of data to be made, and Rhino scales compute resources up or down depending on the need versus requiring a persistent GPU-enabled machine.
- Rhino FCP allows for federated joins across datasets - where different features for the same data subject are stored at different organizations. This capability enables a hybrid data structure, allowing teams to optimize data storage locations based on cost, use metadata to identify cases of interest, and only then process those cases specifically.
Rhino Federated Computing Platform: A Sustainable Path Forward
By combining Edge Computing and Federated Learning, Rhino FCP provides a secure, efficient, and privacy-preserving solution to the NIH data management mandate. The platform eliminates the need for data centralization, mitigates security risks, and enhances reproducibility, thus aligning with the NIH's vision of advancing open science. The Rhino Federated Computing Platform enables broader data access, faster discovery, and improved reproducibility—transforming the way scientific research is conducted.
For more information on how Rhino Health can help with your data management plans reach out to us at [email protected].