Scheduling with Fully Compressible Tasks: Application to Deep Learning Inference with Neural Network Compression
Abstract
With the advent and the growing usage of Machine Learning as a Service (MLaaS), cloud and network systems are now offering the possibility to deploy ML tasks on heterogeneous clusters. Then, network and cloud operators have to schedule these tasks, determining both when and on which devices to execute them. In parallel, several solutions, such as neural network compression, were proposed to build small models which can run on limited hardware. These solutions allow choosing the model size at inference time for any targeted processing time without having to re-train the network.
In this work, we consider the Deadline Scheduling with Compressible Tasks (DSCT) problem: a novel scheduling problem with task deadlines where the tasks can be compressed. Each task can be executed with a certain compression, presenting a trade-off between its compression level (and, its processing time) and its obtained utility. The objective is to maximize the tasks utilities. We propose an approximation algorithm with proved guarantees to solve the problem. We validate its efficiency with extensive simulation, obtaining near optimal results. As application scenario, we study the problem when the tasks are Deep Learning classification jobs, and the objective is to maximize their global accuracy, but we believe that this new framework and solutions apply to a wide range of application cases.
Origin | Files produced by the author(s) |
---|