PEARC20 has ended
Welcome to PEARC20!
PEARC20’s theme is “Catch the Wave.” This year’s theme embodies the spirit of the community’s drive to stay on pace and in front of all the new waves in technology, analytics, and a globally connected and diverse workforce. We look forward to this year’s PEARC20 virtual meeting, where we can share scientific discovery and craft the future infrastructure.

The conference will be held in Pacific Time (PT) and the times listed below are in Pacific Time.

The connection information for all PEARC20 workshops, tutorials, plenaries, track presentations, BOFs, Posters, Visualization Showcase, and other affiliated events, are in the PEARC20 virtual conference platform, Brella. If you have issues joining Brella, please email pearcinfo@googlegroups.com.
Back To Schedule
Wednesday, July 29 • 1:35pm - 3:35pm
Deploying large fixed files datasets with SquashFS and Singularity

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Shared high-performance computing (HPC) platforms, such as those provided by XSEDE and Compute Canada, enable researchers to carry out large-scale computational experiments at a fraction of the cost of the cloud. Most systems require the use of distributed filesystems (e.g. Lustre) for providing a highly multi- user, large capacity storage environment. These suffer performance penalties as the number of files increases due to network contention and metadata performance. We demonstrate how a combination of two technologies, Singularity and SquashFS, can help developers, integrators, architects, and scientists deploy large datasets (O(10M) files) on these shared systems with minimal performance limitations. The proposed integration enables more efficient access and indexing than normal file-based dataset installations, while providing transparent file access to users and processes. Furthermore, the approach does not require administrative privileges on the target system. While the examples studied here have been taken from the field of neuroimaging, the technologies adopted are not specific to that field. Currently, this solution is limited to read- only datasets. We propose the adoption of this technology for the consumption and dissemination of community datasets across shared computing resources.

Wednesday July 29, 2020 1:35pm - 3:35pm PDT