What is the purpose of having a HiMAT data infrastructure?
What kinds of support and services can I expect from each of the different data coordinators?
What methods are in place to ensure my data are safe?
Objectives
Learn about the overall architecture of the HiMAT infrastructure
Introduction
A core feature of the HiMAT project is the construction and utilization of data sharing tools to foster efficient collaboration, reproducible research and enhanced stakeholder engagement related to water resources. Our cloud-based data infrastructure aims to addresses several of the challenges that often limit effective collaboration in such large projects:
cross-team collaboration necessitates the sharing of preliminary data products which are not yet fully validated and may not be ready to share with the public. Existing data infrastructures primarily store completed datasets on public-facing servers. So there is a considerable gap in our provisioning of privately accessible, cloud-based computational tools for this kind of research.
datasets to be generated by HiMAT are particularly voluminous, for example high resolution satellite imagery. Many existing data centers are not set up to handle data of this size, and even if they are, it is unreasonable to be downloading datasets this large to local machines. Therefore we need methods to co-locate our processing/analysis with the location at which the data are stored.
the development collaborative tools calls for some degree of customization in our computational infrastructure if we are to integrate our products and provide decision support to the region. Therefore investigators need to have full access to both front and backend computational components, without the need to submit requests to third party agencies.
Overall Design
To address these challenges we are designing a multi-tiered approach to data handling, one that considers the type of data (raster, vector, time series) as well as its maturity/readiness for distribution. Therefore, where each reseracher goes to store, access and process their data depends on these initial considerations.
Available Resources
There are explicit resources provided to help support adpotion and use of the HiMAT computational resources:
NASA provides online support for specific questions related to ADAPT (support@nccs.nasa.gov)
PI Arendt’s team (Fatland, Setiawan, Shean) provides assistance with data sharing using Amazon Web Services/geoserver, and access to NGA data on ADAPT
NSIDC (Raup) provides assistance with data ingest and sharing on the GLIMS database
Data security
Many groups will produce preliminary datasets that may not be ready for distribution outside of HiMAT. The following practices will help us ensure data products are secure:
all datasets on ADAPT shared directories can only be read by those who have passed through the stringent NASA secuirty protocols. Only administrators of ADAPT can write to these directories. Each individual user has home directories that only they can access.
the commercial cloud (e.g. Amazon Web Services) is extremely secure, providing HIPAA-aligned techonlogies. UW-IT is committed to implementing these technologies in its provisioning of data resources for HiMAT