The best machine learning template?

The best machine learning template?
Photo by Breno Machado / Unsplash

When you have a wonderful idea and are about to launch a new machine learning project, you should take some time to consider what tools you should employ. You're presumably already familiar with machine learning frameworks like TensorFlow or PyTorch if you're reading this. Choosing one of them and starting from scratch is not the best option, especially if the project will be large, especially given the number of experiments that must be performed.

PyTorch Lightning

PyTorch Lightning
The ultimate PyTorch research framework. Scale your models, without the boilerplate.

With complete respect for the TensorFlow library and the potential it provides, this post will concentrate on the PyTorch universe. The pure PyTorch library provides many features for implementing machine learning pipelines, but you may realize that there is a lot of code repetition and other stuff that you probably don't want to worry about, and that is where the PyTorch Lightning library comes in.

PyTorch Lightning is a Python library that offers a high-level interface to PyTorch. That implies we no longer need to manage some aspects of the machine learning process, which can result in significant development time savings. The main goal of Lightning is to reduce the code that practically every ML training involves. Instead of writing the training loop from scratch, we develop LightningModule and use methods like forward and training_step. More information on PyTorch Lightning may be found on their website.

Hydra config

Getting started | Hydra
Introduction

Great! We have the possibility of easily building and modifying our models. Well done Lightning, well done Lightning. However... what if we have a variety of ideas for improvements? What if we need to change and train the model to determine whether they are correct? There are various other strategies we may employ to increase our development time ratio.

Creating configuration files for our machine learning models is not a new thing. Using Hydra for this purpose is not very popular, probably because it came out in 2019 with its 0.11.3 version. It has 6.5k stars on GitHub so far, most likely because it is a brilliant tool. The primary Hydra function is the capability to compose a hierarchical configuration dynamically and modify it via command line and config file input. Because of its capacity to manage several related tasks, much like a Hydra with multiple heads, it earned the moniker Hydra.

Lightning Hydra Template

GitHub - ashleve/lightning-hydra-template: PyTorch Lightning + Hydra. A very user-friendly template for rapid and reproducible ML experimentation with best practices. ⚡🔥⚡
PyTorch Lightning + Hydra. A very user-friendly template for rapid and reproducible ML experimentation with best practices. ⚡🔥⚡ - GitHub - ashleve/lightning-hydra-template: PyTorch Lightning + Hyd...

GithHub user ashleve popularized the combination of these two technologies by creating the Lightning Hydra Template. It is not simply a collection of folders and files containing PyTorch Lightning modules and Hydra configuration files. It is much more than that. Let us begin by indicating whether or not this template should be used for a machine learning project. The author emphasizes that, despite its versatility, it should not be used everywhere.

  • First, things can break; since Lightning and Hydra continue to develop and include several libraries, occasionally things go wrong.
  • Second, templates are better suited for model prototyping than building data pipelines.
  • Third, it limits you as much as Lightning does, so when you need to make some specific adjustments, it could be harder than doing it with pure PyTorch.

If that doesn't put you off, let's see what this template has to offer. According to the project's README, you may easily iterate through new models, datasets, and workloads on multiple hardware accelerators such as CPUs, multi-GPUs, or TPUs by using a handy, all-in-one technical stack. It offers a variety of best practices for repeatability and successful workflow, and the repository has been extensively annotated; utilize it as a knowledge and training resource.

Summary

I first learned about Ashleve's work approximately a year ago when I joined the team at my new position. At first, I had some reservations. Why would you use anything that complicated? Why can't I use print debugging when something doesn't work? It wasn't easy to test the code you wanted to add. However, after a few days of using it and learning about all of its features, I've decided it's the perfect ML template. Now, whether I have something in mind — a new excellent project idea — or just need to complete my Master's thesis, I use the Lightning Hydra Template. Do you want to give it a shot?