Loading…
PEARC20 has ended
Welcome to PEARC20!
PEARC20’s theme is “Catch the Wave.” This year’s theme embodies the spirit of the community’s drive to stay on pace and in front of all the new waves in technology, analytics, and a globally connected and diverse workforce. We look forward to this year’s PEARC20 virtual meeting, where we can share scientific discovery and craft the future infrastructure.

The conference will be held in Pacific Time (PT) and the times listed below are in Pacific Time.

The connection information for all PEARC20 workshops, tutorials, plenaries, track presentations, BOFs, Posters, Visualization Showcase, and other affiliated events, are in the PEARC20 virtual conference platform, Brella. If you have issues joining Brella, please email pearcinfo@googlegroups.com.
Application Software Support Outcomes [clear filter]
Tuesday, July 28
 

10:45am PDT

VisQueue : An Analytical Dashboard for Science Exploration on HPC Systems
Monitoring a supercomputer's performance is vital for high per- formance computing (HPC) support and development, often its users, and even appreciated by the general public. Identifying the effectiveness of a system is key for its host institution's confidence and credibility, and accounting for necessary operational funding. Tracking the research topics and sciences which dominate a sys- tem's time and CPU can help guide decisions regarding design and support for both current and future system resources. This information is also useful for those whose tasks are to inform users, collaborators, and the general public about a system's utilization. Applications that monitor and display system performance, often in substantial detail, are essential. Yet mining the components and quantity of performance statistics can be a challenge. Additionally, the data is usually not linked to available potentially useful stats from a systems' user database like project abstracts and principal investigator (PI). This paper introduces VisQueue, an interactive dashboard for exploring HPC systems. Its users initially see an array of the in- cluded supercomputers, each represented with an interactive sun- burst chart and correlated data table. Each system's currently active projects are listed with science domain, running jobs, and utilized resources. User's can explore the projects at a deeper level, and also reference a system's specifications. These pages include graphs of available metrics, a table for exploring all projects that have utilized the system, and a map showing user locations and outreach. Our motivation was to create something not only useful for HPC maintainers, but easy for the general public to understand the impact and importance of HPC resources. Careful consideration was taken to include only the information that was necessary and not overwhelm the user with too many choices or details.


Tuesday July 28, 2020 10:45am - 11:05am PDT
Brella

12:00pm PDT

Building Science Gateways for Humanities
Building science gateways for humanities poses new challenges in the science gateway community. Compared to science gateways devoted to scientific content, humanities-related projects usually require 1) processing data in various formats, such as text, image, video, etc., 2) constant public access from a large audience, and 3) reliable security upgrade and low maintenance. Most traditional science gateways are monolithic in design, which is easy to write, but they can be computationally inefficient when integrated with numerous scientific packages for data capture and pipeline processing. Since these applications tend to be single-threaded or nonmodular, they can create traffic bottlenecks when processing large numbers of requests. Moreover, these science gateways are usually difficult to be backed up due to long gap between funding periods and age of applications. In this paper, we study the problem of building science gateways for humanities projects by developing service-based architecture and present two such science gateways: the Moving Image Research Collections (MIRC) - a science gateway focusing on image analysis for digital surrogates of historical motion picture film, and SnowVision - a science gateway for studying pottery fragments in southeastern North America. For each science gateway, we present an overview of the background, some unique challenges, design, and implementation. These two science gateways are deployed on XSEDE's Jetstream academic clouding computing resource and are accessed through a web interface. Apache Airavata middleware is used to manage the interactions between the web interface and deep-learning-based (DL) backend service running on the Bridges graphics processing unit (GPU) cluster.


Tuesday July 28, 2020 12:00pm - 1:20pm PDT
Brella

12:00pm PDT

Implementing a Prototype System for 3D Reconstruction of Compressible Flow
PowerFlow3D is a prototype system for acquiring, reconstructing, and visualizing three-dimensional structure of complex flows around objects in wind tunnel test procedures. PowerFlow3d combines modern high-performance computing (HPC) with existing acquisition, reconstruction, and visualization methods to provide a foundational capability that helps to reveal critical information about the underlying structure of unknown flows. We describe the implementation of our system, focusing on tomographic reconstruction, in particular, and highlight the practical challenges encountered throughout our initial research and development (R&D) process. The resulting prototype achieves both reasonable performance and fidelity and provides opportunities for enhanced performance, fidelity, and scale. The results of this initial R&D effort thus enable continued progress toward a scalable HPC-accelerated system for guiding real-time decisions during wind tunnel tests.


Tuesday July 28, 2020 12:00pm - 1:20pm PDT
Brella

12:00pm PDT

PEGR: a management platform for ChIP-based next generation sequencing pipelines
There has been a rapid development in genome sequencing, including high-throughput next generation sequencing (NGS) technologies, automation in biological experiments, new bioinformatics tools and utilization of high-performance computing and cloud computing. ChIP-based NGS technologies, e.g. ChIP-seq and ChIP-exo, are widely used to detect the binding sites of DNA-interacting proteins in the genome and help us to have a deeper mechanistic understanding of genomic regulation. As sequencing data is generated at an unprecedented pace from the ChIP-based NGS pipelines, there is an urgent need for a metadata management system. To meet this need, we developed the Platform for Eukaryotic Genomic Regulation (PEGR), a web service platform that logs metadata for samples and sequencing experiments, manages the data processing workflows, and provides reporting and visualization. PEGR links together people, samples, protocols, DNA sequencers and bioinformatics computation. With the help of PEGR, scientists can have a more integrated understanding of the sequencing data and better understand the scientific mechanisms of genomic regulation. In this paper, we present the architecture and the major functionalities of PEGR. We also share our experience in developing this application and discuss the future directions.


Tuesday July 28, 2020 12:00pm - 1:20pm PDT
Brella

12:00pm PDT

Tailoring Data Visualization to Diversely Informed End Users
Visualization is invaluable for communicating complex data, and it is becoming more important when researchers seek insights, or when they need to share their insights with end-users with diverse level of expertise such as policy-makers or the general public. In this paper, we detail how we approach this issue using the HAVEN and ProjecTable systems we developed. These systems share data of projected renewable energy supply based on several possible scenarios that should help Hawaii achieve 100% renewable energy by 2045. However, each system offers different features directed at end-users interested in different levels of detail. We describe here the two systems and their use, both separately and in conjunction.


Tuesday July 28, 2020 12:00pm - 1:20pm PDT
Brella

1:35pm PDT

Building an Interactive Workbench Environment for Single Cell Genomics Applications
We discuss the procedure to build an interactive workbench environment for single cell genomic applications with the Open OnDemand (OOD) science gateway. In our approach, an end-user submits a complex single cell RNA sequencing (scRNA) pipeline, checks the status of the job, and visualizes the output results. All of these tasks are accomplished through a web browser, relieving the users from the complexities involved in developing and handling a large-scale workflow. Our approach helped researchers in processing several input data sets of scRNA in the campus HPC cluster. Although the current work is focused on scRNA analysis, the same approach can be extended for any workflow.


Tuesday July 28, 2020 1:35pm - 2:35pm PDT
Brella

1:35pm PDT

GeoEDF: An Extensible Geospatial Data Framework for FAIR Science
Collaborative scientific research is now increasingly conducted online in web-based research platforms termed "science gateways". Most science gateways provide common capabilities including data management and sharing, scientific code development, high performance computing (HPC) integration, and scientific workflow execution of varying automation. Despite the availability of scientific workflow frameworks such as Pegasus and workflow definition languages such as the Common Workflow Language (CWL), in practice typical workflows on science gateways still involve a mix of non-reusable code, desktop tools, and intermediate data wrangling. With the growing emphasis on FAIR (Findable, Accessible, Interoperable, Reusable) science, such mixed workflows present a significant challenge to ensuring compliance to these principles. These challenges are further compounded in the earth sciences where researchers spend inordinate amounts of time manually acquiring, wrangling, and processing earth observation data from repositories managed by organizations such as NASA, USGS, etc. Our extensible geospatial data framework, GeoEDF is designed to address these challenges, making remote datasets directly usable in computational code and facilitating earth science workflows that execute entirely in a science gateway. In this paper we describe the design of GeoEDF, current implementation status, and future work.


Tuesday July 28, 2020 1:35pm - 2:35pm PDT
Brella

1:35pm PDT

Log Discovery for Troubleshooting Open Distributed Systems with TLQ
Troubleshooting a distributed system can be incredibly difficult. It is rarely feasible to expect a user to know the fine-grained interactions between their system and the environment configuration of each machine used in the system. Because of this, work can grind to a halt when a seemingly trivial detail changes. To address this,there is a plethora of state-of-the-art log analysis tools, debuggers,and visualization suites. However, a user may be executing in an open distributed system where the placement of their components are not known before runtime. This makes the process of tracking debug logs almost as difficult as troubleshooting the failures these logs have recorded because the location of those logs is usually not transparent to the user (and by association the troubleshooting tools they are using). We present TLQ, a framework designed from first principles for log discovery to enable troubleshooting of open distributed systems. TLQ consists of a querying client and a set of servers which track relevant debug logs spread across an open distributed system. Through a series of examples, we demonstrate how TLQ enables users to discover the locations of their system's debug logs and in turn use well-defined troubleshooting tools upon those logs in a distributed fashion. Both of these tasks were previously impractical to ask of an open distributed system without significant a priori knowledge. We also concretely verify TLQ's effectiveness by way of a production system: a biodiversity scientific workflow. We note the potential storage and performance overheads of TLQ compared to a centralized, closed system approach.


Tuesday July 28, 2020 1:35pm - 2:35pm PDT
Brella
 
Wednesday, July 29
 

10:15am PDT

Atomic and Molecular Scattering Applications in an Apache Airavata Science Gateway
In this paper we document recent progress made in the development and deployment of a science gateway for atomic and molecular physics (AMP) [10]. The molecular scattering applications supported in the gateway and the early phase of the project have been described in an earlier publication [33]. Our objective in this paper is to present recent advances in both the capabilities and the adoption of the platform for additional software suites and new possibilities for further development. The applications being deployed provide users with a number of state-of-the-art computational techniques to treat electron scattering from atomic and molecular targets and the interaction of radiation with such systems. One may view all of these approaches as generalized close-coupling methods, where the inclusion of electron correlation is accomplished via the addition of generalized pseudostates. A number of the methods can also be employed to compute high-quality bound-state wavefunctions by closing the channels and imposing exponentially decaying boundary conditions. The application software suites are deployed on a number of NSF and DoE supercomputing systems. These deployments are brought to the user community through the science gateway with user interfaces, post-processing, and visualization tools. Below we outline our efforts in deploying the Django web framework for the AMPGateway using the Apache Airavata gateway middleware, discuss the new advanced capabilities available, and provide an outlook for future directions for the gateway and the AMP community.


Wednesday July 29, 2020 10:15am - 10:55am PDT
Brella

10:15am PDT

Toward a Data Lifecycle Model for NSF Large Facilities
National Science Foundation large facilities conduct large-scale physical and natural science research. They include telescopes that survey the sky, gravitational wave detectors that look deep into our universe's past, and sensor-driven field sites that collect a range of biological and environmental data. The Cyberinfrastructure Center for Excellence (CICoE) pilot aims to develop a model for a center that facilitates community building, fosters knowledge sharing, and applies best practices in consulting with large facilities about their cyberinfrastructure. To accomplish this goal, the pilot began an in-depth study of how large facilities manage their data. Large facilities are diverse and highly complex, from the types of data they capture, to the types of equipment they use, to the types of data processing and analysis they conduct, to their policies on data sharing and use. Because of this complexity, the pilot needed to find a single lens through which it could frame its growing understanding of large facilities and identify areas where it could best serve large facilities. As a result of the pilot's research into large facilities, common themes emerged that enabled the creation of a data lifecycle model that successfully captures the data management practices of large facilities. This model has enabled the pilot to organize its thinking about large facilities, and frame its support and consultation efforts around the cyberinfrastructure used during research. This paper describes the model and discusses how it was applied to disaster recovery planning for a representative large facility-IceCube.


Wednesday July 29, 2020 10:15am - 10:55am PDT
Brella

12:00pm PDT

Cluster Usage Policy Enforcement Using Slurm Plugins and an HTTP API
Managing and limiting cluster resource usage is a critical task for computing clusters with a large number of users. By enforcing usage limits, cluster managers are able to ensure fair availability for all users, bill users accordingly, and prevent the abuse of cluster resources. As this is such a common problem, there are naturally many existing solutions. However, to allow for greater control over usage accounting and submission behavior in Slurm, we present a system composed of: a web API which exposes accounting data; Slurm plugins that communicate with a REST-like HTTP implementation of that API; and client tools that use it to report usage. Key advantages of our system include a customizable resource accounting formula based on job parameters, preemptive blocking of user jobs at submission time, project-level and user-level resource limits, and support for the development of other web and command-line clients that query the extensible web API. We deployed this system on Berkeley Research Computing's institutional cluster, Savio, allowing us to automatically collect and store accounting data, and thereby easily enforce our cluster usage policy.


Wednesday July 29, 2020 12:00pm - 1:20pm PDT
Brella

12:00pm PDT

ICEBERG: Imagery Cyber-infrastructure and Extensible Building blocks to Enhance Research in the Geosciences. (A Research Programmer's Perspective)
The ICEBERG (Imagery Cyber-infrastructure and Extensible Building blocks to Enhance Research in the Geosciences) project (NSF 1740595) aims to (1) develop open source image classification tools tailored to high-resolution satellite imagery of the Arctic and Antarctic to be used on HPDC resources, (2) create easy-to-use interfaces to facilitate the development and testing of algorithms for application to specific geoscience requirements, (3) apply these tools through use cases that span the biological, hydrological, and geoscience needs of the polar community, and (4) transfer these tools to the larger non-polar community. Here we report on the project status and lessons learned.


Wednesday July 29, 2020 12:00pm - 1:20pm PDT
Brella

12:00pm PDT

Research Computing Infrastructure and Researcher Engagement: Notes from Neuroscience
The advent of the Mortimer B. Zuckerman Mind Brain Behavior Institute (Zuckerman Institute) at Columbia University over a decade ago presented the opportunity to design a discipline-focused Research Computing (RC) group allowing for close collaboration with a relatively fixed number of neuroscience laboratories to enhance discovery. Experiences and observations related to tailoring Zuckerman computing infrastructure, creating "task-based" services and systems, and engaging with researchers in our Institute are shared to inform others about establishing discipline-focused research computing teams. Case studies related to providing a GPU cluster service tailored to Institute needs and the evolution of infrastructure choices to hybrid designs allowing bursting to vendor-provided cloud services are reviewed. Future directions involving research software engineering and sharing whole data analysis pipelines are noted.


Wednesday July 29, 2020 12:00pm - 1:20pm PDT
Brella

12:00pm PDT

Using Containers to Create More Interactive Online Training and Education Materials
Containers are excellent hands-on learning environments for computing topics because they are customizable, portable, and reproducible. The Cornell University Center for Advanced Computing has developed the Cornell Virtual Workshop in high performance computing topics for many years, and we have always sought to make the materials as rich and interactive as possible. Toward the goal of building a more hands-on experimental learning experience directly into web-based online training environments, we developed the Cornell Container Runner Service, which allows online content developers to build container-based interactive edit and run commands directly into their web pages. Using containers along with CCRS has the potential to increase learner engagement and outcomes.


Wednesday July 29, 2020 12:00pm - 1:20pm PDT
Brella

1:35pm PDT

A Science Gateway for Simulating the Economics of Carbon Sequestration Technologies: SimCCS2.0
The SimCCS2.0 Gateway provides a science gateway for optimizing CO2 capture, transport, and storage infrastructure. We describe the design, creation, and production deployment of this platform, which is based on an Apache Airavata gateway middleware framework. This gateway provides an integrated infrastructure for data, modeling, simulation, and visualization of carbon sequestration technologies and their economics. It does so through simple user interfaces to map and select input data, build models, set up, and execute simulations on high performance computing systems. Also feature are community case studies to use as reference sets for verifying reproducibility of published models and reusing their respective data for modified simulations. The portal addresses the needs of diverse international stakeholders and provides a platform for integrating novel and complex models for carbon sequestration technologies moving into the future.


Wednesday July 29, 2020 1:35pm - 3:35pm PDT
Brella

1:35pm PDT

An extensible Django-based web portal for Apache Airavata
The Apache Airavata science gateway middleware project has developed a new web frontend for the middleware's API based on the Django web framework and the Vue.js JavaScript framework. This new frontend has been designed to be a framework, called the Airavata Django Portal Framework (ADPF) that science gateway developers can use to customize and extend the user interface to add domain specific UI metaphors and to add gateway-specific user workflows. There are three main modes of extensibility: 1) custom scientific application execution configuration, 2) custom application results analysis, and 3) wholly custom user workflows. These modes of extensibility come out of the project's experience working with science gateways over the years. This new framework has been put into production for the 30+ science gateways hosted by the Science Gateways as a Platform (SciGaP) project at Indiana University and several gateways have already made extensions using ADPF.


Wednesday July 29, 2020 1:35pm - 3:35pm PDT
Brella

1:35pm PDT

Custos: Security Middleware for Science Gateways
Science gateways represent potential targets for cybersecurity threats to users, scientific research, and scientific resources. In this paper, we introduce Custos, a software framework that provides common security operations for science gateways, including user identity and access management, gateway tenant profile management, resource secrets management, and groups and sharing management. The goals of the Custos project are to provide these services to a wide range of science gateway frameworks, providing the community with an open source, transparent, and reviewed code base for common security operations; and to operate trustworthy security services for the science gateway community using this software base. To accomplish these goals, we implement Custos using a scalable microservice architecture that can provide highly available, fault tolerant operations. Custos exposes these services through a language-independent Application Programming Interface that encapsulates science gateway usage scenarios.


Wednesday July 29, 2020 1:35pm - 3:35pm PDT
Brella

1:35pm PDT

FutureWater Indiana: A science gateway for spatio-temporal modeling of water inWabash basin with climate change in focus
In this manuscript, we describe the FutureWater Science Gateway that simulates regional watersheds spatially and temporally to derive hydrological changes due to changes in critical effectors such as climate, land use and management, and soil conditions. We also discuss the gateway design, creation, and production deployment and how the resulting data is organized and explored. The Future- Water gateway is built based on Apache Airavata gateway middleware framework and hosted under the SciGaP project at Indiana University. The gateway provides an integrated infrastructure for simulations based on parallelized Soil and Water Assessment Tool (SWAT) and SWAT-MODFLOWsoftware execution on Extreme Science and Engineering Discovery Environment (XSEDE) and Indiana University's (IU's) HPC resources. It organizes data in optimized relational databases and enables intuitive simulation result data exploration. The visualization involves geographical map integration and dynamic data provisioning using the R-Shiny application deployed in the gateway. The gateway provides intuitively simple user interfaces for providing simulation input data and combines available model data; it makes it possible to set up and execute the he portal addresses the needs of diverse stakeholder communities for educational, research, exploration and planning in academic, governmental and NGO organizations.


Wednesday July 29, 2020 1:35pm - 3:35pm PDT
Brella

1:35pm PDT

Reproducible and Portable Workflows for Scientific Computing and HPC in the Cloud
The increasing availability of cloud computing services for science has changed the way scientific code can be developed, deployed, and run. Many modern scientific workflows are capable of running on cloud computing resources. Consequently, there is an increasing interest in the scientific computing community in methods, tools, and implementations that enable moving an application to the cloud and simplifying the process, and decreasing the time to meaningful scientific results. In this paper, we have applied the concepts of containerization for portability and multi-cloud automated deployment with industry-standard tools to three scientific workflows. We show how our implementations provide reduced complexity to portability of both the applications themselves, and their deployment across private and public clouds. Each application has been packaged in a Docker container with its dependencies and necessary environment setup for production runs. Terraform and Ansible have been used to automate the provisioning of compute resources and the deployment of each scientific application in a Mulit-VM cluster. Each application has been deployed on the AWS and Aristotle federated cloud platforms. Variation in data management constraints, Multi-VM MPI communication, and embarrassingly parallel instance deployments were all explored and reported on. We thus present a sample of scientific workflows that can be simplified using the tools and our proposed implementation to deploy and run in a variety of cloud environments.


Wednesday July 29, 2020 1:35pm - 3:35pm PDT
Brella

1:35pm PDT

Tapis API Development with Python: Best Practices In Scientific REST API Implementation - Experience implementing a distributed Stream API 🏆
🏆 Best Paper in “Application Software, Support, and Outcomes” Track

In the last decade, the rise of hosted Software-as-a-Service (SaaS) application programming interfaces (APIs) across both academia and industry has exploded, and simultaneously, microservice architectures have replaced monolithic application platforms for the flexibility and maintainability they offer. These SaaS APIs rely on small, independent and reusable microservices that can be assembled relatively easily into more complex applications. As a result, developers can focus on their own unique functionality and surround it with fully functional, distributed processes developed by other specialists, which they access through APIs. The Tapis framework, a NSF funded project, provides SaaS APIs to allow researchers to achieve faster scientific results, by eliminating the need to set up a complex infrastructure stack. In this paper, we describe the best practices followed to create Tapis APIs using Python and the Stream API as an example implementation illustrating authorization and authentication with the Tapis Security Kernel, Tenants and Tokens APIs, leveraging OpenAPI v3 specification for the API definitions and docker containerization. Finally, we discuss our deployment strategy with Kubernetes, which is an emerging orchestration technology and the early adopter use cases of the Streams API service.

Speakers
avatar for Gwen Jacobs

Gwen Jacobs

University of Hawaii
Gwen Jacobs serves as the Director of Cyberinfrastructure for the University of Hawaii System.  She also serves as the Director and PI of Hawaii EPSCoR and as the Co-Chair of the NSF Advisory Council for Cyberinfrastructure.  She is a computational neuroscientist by training with... Read More →


Wednesday July 29, 2020 1:35pm - 3:35pm PDT
Brella
 
Thursday, July 30
 

8:00am PDT

MetaFlow|mics: Scalable and Reproducible Nextflow Pipelines for the Analysis of Microbiome Marker Data
Computational scalability has become an important requirement for processing the massive amounts of data generated in contemporary sequencing-based experiments. The availability of large computational resources through academic, regional or national cyber-infrastructure efforts, as well as through inexpensive cloud offerings, has shifted the bottleneck, which now lies in the extensive expertise necessary to create reproducible and scalable bioinformatics pipelines and deploy them to such diverse infrastructures. We present here MetaFlow|mics, a comprehensive pipeline for the analysis of microbiome marker data using reproducibility, best practices and state-of-the-art cyberinfrastructure standards. MetaFlow|mics provides seamless scalability and extensibility, allowing users to build and test their pipelines on a laptop with small datasets and to subsequently run them on large datasets on an HPC or on the Cloud with a change to a single line of code. Our framework is built on top of the Nextflow workflow management system and provides an interoperable architecture that leverages self-contained Docker and Singularity instances with all the dependencies and requirements needed to quickly deploy and use the pipeline.


Thursday July 30, 2020 8:00am - 9:40am PDT
Brella

8:00am PDT

NLP Workflows for Computational Social Science: Understanding Triggers of State-Led Mass Killings
We leverage statistical and natural language processing (NLP) tools for a systematic analysis of triggers of state-led mass killings. The work advances the application of statistics and NLP in the social sciences and also contributes to scholarly efforts by empirically identifying the prominent triggering events of civilian mass killings. More specifically we seek to understand the timing and dynamics of political violence escalation, by examining systematically how certain types of political events may generate a government's policy of mass killing of civilians. The project provides pathways for the general application of promising NLP and statistical methods to the analysis of social event triggers as gleaned from big data repositories. Key objectives include: 1) To develop open source natural language processing (NLP) dictionaries and inference engines for event identification from texts, which are especially valuable for the analysis of political conflict and 2) Construct and validate a computational workflow to machine code millions of news articles (via NLP) for event identification, from a volume of data orders of magnitude larger than could be manually coded by a team of human readers. Having made considerable progress over multiple semesters, we share the methods and tools that have enabled us to overcome significant computational data analytics challenges.


Thursday July 30, 2020 8:00am - 9:40am PDT
Brella

8:00am PDT

Scientific Data Annotation and Dissemination: Using the ‘Ike Wai Gateway to Manage Research Data
Granting agencies invest millions of dollars on the generation and analysis of data, making these products extremely valuable. However, without sufficient annotation of the methods used to collect and analyze the data, the ability to reproduce and reuse those products suffers. This lack of assurance of the quality and credibility of the data at the different stages in the research process essentially wastes much of the investment of time and funding and fails to drive research forward to the level of potential possible if everything was effectively annotated and disseminated to the wider research community. In order to address this issue for the Hawai'i Established Program to Stimulate Competitive Research (EPSCoR) project, a water science gateway was developed at the University of Hawai'i (UH), called the 'Ike Wai Gateway. In Hawaiian, 'Ike means knowledge and Wai means water. The gateway supports research in hydrology and water management by providing tools to address questions of water sustainability in Hawai'i. The gateway provides a framework for data acquisition, analysis, model integration, and display of data products. The gateway is intended to complement and integrate with the capabilities of the Consortium of Universities for the Advancement of Hydrologic Science's (CUAHSI) Hydroshare by providing sound data and metadata management capabilities for multi-domain field observations, analytical lab actions, and modeling outputs. Functionality provided by the gateway is supported by a subset of the CUAHSI's Observations Data Model (ODM) delivered as centralized web based user interfaces and APIs supporting multi-domain data management, computation, analysis, and visualization tools to support reproducible science, modeling, data discovery, and decision support for the Hawai'i EPSCoR 'Ike Wai research team and wider Hawai'i hydrology community. By leveraging the Tapis platform, UH has constructed a gateway that ties data and advanced computing resources together to support diverse research domains including microbiology, geochemistry, geophysics, economics, and humanities, coupled with computational and modeling workflows delivered in a user friendly web interface with workflows for effectively annotating the project data and products. Disseminating results for the 'Ike Wai project through the 'Ike Wai data gateway and Hydroshare makes the research products accessible and reusable.

Speakers
avatar for Gwen Jacobs

Gwen Jacobs

University of Hawaii
Gwen Jacobs serves as the Director of Cyberinfrastructure for the University of Hawaii System.  She also serves as the Director and PI of Hawaii EPSCoR and as the Co-Chair of the NSF Advisory Council for Cyberinfrastructure.  She is a computational neuroscientist by training with... Read More →


Thursday July 30, 2020 8:00am - 9:40am PDT
Brella

8:00am PDT

VisSnippets: A Web-Based System for Impromptu Collaborative Data Exploration on Large Displays 🏆
🏆 Student paper – Honorable Mention in “Application Software, Support, and Outcomes” Track

The VisSnippets system is designed to facilitate effective collaborative data exploration. VisSnippets leverages SAGE2 middleware that enables users to manage the display of digital media content on large displays, thereby providing collaborators with a high-resolution common workspace. Based in JavaScript, VisSnippets provides users with the flexibility to implement and/or select visualization packages and to quickly access data in the cloud. By simplifying the development process, VisSnippets removes the need to scaffold and integrate interactive visualization applications by hand. Users write reusable blocks of code called "snippets" for data retrieval, transformation, and visualization. By composing dataflows from the group's collective snippet pool, users can quickly execute and explore complementary or contrasting analyses. By giving users the ability to explore alternative scenarios, VisSnippets facilitates parallel work for collaborative data exploration leveraging large-scale displays. We describe the system, its design and implementation, and showcase its flexibility through two example applications.


Thursday July 30, 2020 8:00am - 9:40am PDT
Brella

10:00am PDT

Design and Deployment of Photo2Building: A Cloud-based Procedural Modeling Tool as a Service
We present a Photo2Building tool to create a plausible 3D model of a building from only a single photograph. Our tool is based on a prior desktop version which, as described in this paper, is converted into a client-server model, with job queuing, web-page support, and support of concurrent usage. The reported cloud-based web-accessible tool can reconstruct a building in 40 seconds on average and costing only 0.60 USD with current pricing. This provides for an extremely scalable and possibly widespread tool for creating building models for use in urban design and planning applications. With the growing impact of rapid urbanization on weather and climate and resource availability, access to such a service is expected to help a wide variety of users such as city planners, urban meteorologists worldwide in the quest to improved prediction of urban weather and designing climate-resilient cities of the future.


Thursday July 30, 2020 10:00am - 11:00am PDT
Brella

10:00am PDT

Environmental Visualization: Moving Beyond the Rainbows
Pseudo-coloring is a well-established, fundamental tool for visualizing scientific data. As the size and density of data grows, increasingly more discriminatory power is required to extract optimum feature resolution. The environmental community, in particular, relies heavily on this technology to dissect and interpret a huge variety of visual data. These scientists often turn to traditional rainbow colormaps, despite their well-documented deficiencies in rendering dense detail. A popular default, the desaturated rainbow's non-monotonically varying luminance range misrepresents data. Despite increasing overall feature resolution, this variance creates hue simultaneity and vibration, introducing false artifacts, neutralizing swaths of data and impeding analysis. Drawing on artistic color theory, we hypothesized the desaturated rainbow could be improved by increasing luminance ranges, decreasing saturation, and employing hue-cycling to boost discriminatory power. These adjusted maps exhibit algorithmically corroborated higher feature resolve, a primary objective of all scientists interviewed, without distorting data in discordant false coloring. Our studies indicate that our maps are preferred by these domain scientists, thereby providing a potential alternative for effective, human-centric colormapping.


Thursday July 30, 2020 10:00am - 11:00am PDT
Brella

10:00am PDT

The Hawai‘i Rainfall Analysis and Mapping Application (HI-RAMA): Decision Support and Data Visualization for Statewide Rainfall Data
This paper discusses the design and implementation of the Hawai'i Rainfall Analysis and Mapping Application (HI-RAMA) decision support tool, an application providing researchers and community stakeholders interactive access to and visualization of hosted historical and near-real-time monthly rainfall maps and aggregated rainfall station observational data for the State of Hawai'i. The University of Hawai'i Information Technology Services Cyberinfrastructure team in partnership with members of the the Hawai'i Established Program to Stimulate Competitive Research (EPSCoR) 'Ike Wai project team developed the HI-RAMA as part of the 'Ike Wai Gateway to support water sustainability research for the state of Hawai'i. This tool is designed to provide user-friendly access to the information that can reveal the impacts of climate changes related to precipitation so users can make data-driven decisions.

Speakers
avatar for Gwen Jacobs

Gwen Jacobs

University of Hawaii
Gwen Jacobs serves as the Director of Cyberinfrastructure for the University of Hawaii System.  She also serves as the Director and PI of Hawaii EPSCoR and as the Co-Chair of the NSF Advisory Council for Cyberinfrastructure.  She is a computational neuroscientist by training with... Read More →


Thursday July 30, 2020 10:00am - 11:00am PDT
Brella

12:00pm PDT

Accelerated Real-time Network Monitoring and Profiling at Scale using OSU INAM
Designing a scalable real-time monitoring and profiling tool with low overhead for network analysis and introspection capable of capturing all relevant network events is a challenging task. Newer set of challenges come out as HPC systems are becoming larger and users are expecting to have better capabilities like real-time profiling at fine granularity. We take up this challenge by redesigning OSU INAM and making it capable to gather, store, retrieve, visualize, and analyze network metrics for large and complex HPC clusters. The enhanced OSU INAM tool provides scalability, low overhead and fined-granularity InfiniBand port counter inquiry and fabric discovery for HPC users, system administrators, and HPC developers. Our experiments show that, for a cluster of 1,428 nodes and 114 switches, the proposed design can gather fabric metrics at very fine (sub-second) granularity and discovers the complete network topology in approximately 5 minutes. The proposed design has been released publicly as a part of OSU INAM Tool and is available for free download and use from the project website.


Thursday July 30, 2020 12:00pm - 1:20pm PDT
Brella

12:00pm PDT

Implementing a Loosely-Coupled Integrated Assessment Model in the Pegasus Workflow Management System
Integrated assessment models (IAMs) are commonly used to explore the interactions between different modeled components of socio-environmental systems (SES). Most IAMs are built in a tightly-coupled framework so that the complex interactions between the models can be efficiently implemented within the framework in a straight forward manner. However, tightly-coupled frameworks make it more difficult to change individual models within the IAM because of the high level of integration between the models. Prioritizing flexibility over computational efficiency, the IAM presented here is built using a loosely-coupled framework and implemented in the Pegasus Workflow Management System. The modular nature of loosely-coupled systems allows each component model within the IAM to be easily exchanged for another component model from the same domain assuming each provides the same input / output interface. This flexibility allows researchers to experiment with different models for each SES component and facilitates simple upgrades between each version of the independently developed component models.


Thursday July 30, 2020 12:00pm - 1:20pm PDT
Brella
 
  • Timezone
  • Filter By Date PEARC20 Jul 27 -31, 2020
  • Filter By Venue Brella
  • Filter By Type
  • Advanced Research Computing Environments
  • Application Software Support Outcomes
  • BOF
  • Break
  • Co-located Event
  • Lunch Break
  • ML/AI
  • Panel
  • Plenary
  • Poster/VIS Reception
  • Student Program
  • Tutorials
  • Workforce development Diversity Professionalization
  • Workshops


Twitter Feed

Filter sessions
Apply filters to sessions.