Slune

Written on January 11th, 2024 by Henry Bourne

Slune

Slune stand for ‘SLURM + tune’ and is a python package made to perform hyperparameter tuning (and other things!) on SLURM. Slune has been made to be quick and easy to use and build on! To get your computation scheduled and finished faster!

I created slune to solve the problems I ran into when trying to manage scheduling, conducting and saving results for large numbers of experiments (training neural networks) on SLURM. Mostly hyperparameter tuning. There exist many packages that can help you perform hyperparameter tuning and specifically handle splitting your different training runs over multiple GPU’s. However, I found most of these packages were very large, clunky and didn’t fit my needs. As they are packages with large clunky code-bases it can be difficult to work with them, especially if anything goes wrong or if you would like to contribute/extend the packages functionality.

The other problem is that although these packages did interface with SLURM it was not particularly useful in lots of contexts. For example, if you are carrying out hyperparameter tuning you want to be able to run multiple experiments all in parallel since hyperparameter tuning is mostly embarrassingly parallel (unless you are performing some kind of iterative search). This could be achieved with other packages, but it required requesting multiple GPU’s at the same time, herein lies the problem. If you are requesting multiple GPU’s on SLURM you are going to be waiting a long time for your job to start as you need to wait for the GPU’s to all be available at the same time. This is silly! there is no reason you need the jobs to run at the same time (again, embarrassingly parallel).

This prompted the creation of slune which I set out to make as simple as possible to use and extend. Slune allows you to easily set-up a search space which can then be used to submit many jobs at once. For example if you only need 1 hour and one GPU per job and you have 10 jobs then your jobs with slune should be put on very quickly as soon as GPU’s are free for one hour. Whereas if you use a different tuning package you will have to wait until you have 10 GPU’s available for 1 hour, or 5 GPU’s available for 2 hours, etc.

Slune also handles all the saving and reading of your results!

And if you don’t like it you can easily extend it to fit your needs as it is a small package with a simple modular and composable code base!

Although tuning is in the name slune can really be used for any embarrassingly parallel task you may want to run on SLURM. Even if you don’t use SLURM you can still use slune to handle naming and saving your results! Even running jobs in parallel on your local machine! (if you write some custom bash scripts). So maybe it shouldn’t be called slune 🤔, but the name is fun!

Check out the repo here! and the docs here! It’s installable via pip! (pip install slune-lib)

It’s still very much a work in progress so bear with. I add when I can! If you would like to contribute I encourage you to do so!