Scheduling with Fully Compressible Tasks: Application to Deep Learning Inference with Neural Network Compression - IDEX UCA JEDI Université Côte d'Azur Access content directly
Conference Papers Year : 2024

Scheduling with Fully Compressible Tasks: Application to Deep Learning Inference with Neural Network Compression

Abstract

With the advent and the growing usage of Machine Learning as a Service (MLaaS), cloud and network systems are now offering the possibility to deploy ML tasks on heterogeneous clusters. Then, network and cloud operators have to schedule these tasks, determining both when and on which devices to execute them. In parallel, several solutions, such as neural network compression, were proposed to build small models which can run on limited hardware. These solutions allow choosing the model size at inference time for any targeted processing time without having to re-train the network. In this work, we consider the Deadline Scheduling with Compressible Tasks (DSCT) problem: a novel scheduling problem with task deadlines where the tasks can be compressed. Each task can be executed with a certain compression, presenting a trade-off between its compression level (and, its processing time) and its obtained utility. The objective is to maximize the tasks utilities. We propose an approximation algorithm with proved guarantees to solve the problem. We validate its efficiency with extensive simulation, obtaining near optimal results. As application scenario, we study the problem when the tasks are Deep Learning classification jobs, and the objective is to maximize their global accuracy, but we believe that this new framework and solutions apply to a wide range of application cases.
Fichier principal
Vignette du fichier
CCGRID_2024.pdf (566.1 Ko) Télécharger le fichier
Origin Files produced by the author(s)

Dates and versions

hal-04497548 , version 1 (10-03-2024)

Identifiers

Cite

Tiago da Silva Barros, Frédéric Giroire, Ramon Aparicio-Pardo, Stephane Perennes, Emanuele Natale. Scheduling with Fully Compressible Tasks: Application to Deep Learning Inference with Neural Network Compression. CCGRID 2024 - 24th IEEE/ACM international Symposium on Cluster, Cloud and Internet Computing, IEEE/ACM, May 2024, Philadelphia, United States. ⟨10.1109/CCGrid59990.2024.00045⟩. ⟨hal-04497548⟩
179 View
91 Download

Altmetric

Share

Gmail Mastodon Facebook X LinkedIn More