Bioconductor Docker Images For Multi-Node Parallel Computing On The Cloud***

Bioconductor Docker Images For Multi-Node Parallel Computing On The Cloud***


Author(s): Nitesh Turaga

Affiliation(s): Dana Farber Cancer Institute

Twitter: @niteshturaga

Bioconductor produces docker images that are widely used because they containerize system dependencies of all Bioconductor packages along with the community version of RStudio. Using Kubernetes, a container orchestration software, it is now possible to deploy these docker images on a cluster and use them for multi-node parallel computing. In this workshop, we introduce commands to launch such a cluster on a cloud provider (Google, Azure, AWS) and use a new BiocParallel back-end called 'RedisParam' to distribute jobs from the manager to the workers. In addition, the paradigm creates a traditional parallel computing framework on the cloud using the same containerized applications available to experiment with on local machines. The advantage of such a cluster launched by Kubernetes is fault tolerance and the potential of auto-scaling. Prequiresites: Some familiarity with BiocParallel and Bioconductor docker images.

Workshop details

Source code

Orchestra

1. Go to Orchestra.
2. Log in.
3. Search for the workshop of interest.
4. Click "Launch" (may take a minute or two).
5. Follow instructions.

On YouTube: