Equivariant Deep Learning Based on Scale-Spaces and Moving Frames

Mateus Sangalli

Résumé

In the context of neural networks, equivariance or invariance to transformations can induce a better generalization to new data as soon as the data is symmetric to the relevant transformations. In particular, in the realm of computer vision most tasks have some kind of geometrical symmetry, for example, a translation of an object in segmentation tasks usually does not change the class of the object.The main objective of this thesis is exploring and developing neural networks that are equivariant to transformations. The two main frameworks which were used were the scale-equivariant networks based on scale-cross-correlation and the group-equivariant networks based on moving frames.The former is based on the generalization of the scale-semigroup-equivariant networks recently proposed which used the Gaussian scale-space as a way to map images from their original domain to a domain of scales and translations. The generalization proposed in this thesis allows for a much more general class of scale-spaces to be used as liftings and it is shown that morphological liftings are beneficial when only the only information available is the shape of objects. Equivariance of the upsampling and downsampling operators is studied and applied with the scale-cross-correlation to create an architecture similar to the U-Net, the SEU-Net, which was shown to improve generalization to unseen scales of the U-Net.The method of moving frames is a classical approach to finding differential invariants in manifolds and in the present work it was applied to define neural network blocks that are equivariant to the action of a Lie group. In particular, it was applied to the definition of a neural network equivariant to rotations and translations of images. The proposed network was tested in the classification of rotated digits, and despite its numerical issues, it achieved results competitive with other rotation-equivariant models with similar size. Moreover, in order to deal with the numerical problems a solution which computes invariants from a single moving frame was proposed and applied to create a network equivariant to rotations and translations of 3D volumes. The 3D rotation-equivariant network was applied to tasks of low-resolution medical volume classification and achieved state-of-the-art results for most of the tested datasets.

Dans le contexte des réseaux de neurones, l'équivariance et l'invariance par des transformations peuvent induire une meilleure généralisation à de nouvelles données si ces dernières contiennent les symétries correspondantes. En particulier, dans le champs de la vision par ordinateur, la plupart des tâches doivent tenir compte des symétries géométriques. Ainsi, par exemple, la translation d'un objet dans une tâche de segmentation ne doit pas changer la classe de l'objet.L'objectif principal de cette thèse est d'explorer et développer des réseaux de neurones qui sont équivariants par rapport à certaines transformations. Les deux principales méthodes qui ont été utilisées sont les réseaux équivariants par changement d'échelle basés sur la correlation-croisée sur le groupe des homothéties, et les réseaux équivariants par l'action d'un groupe de Lie basés sur la méthode des repères mobiles.La première méthode est basée sur la généralisation des réseaux équivariants par un semi-groupe d'échelles qui ont été proposés récemment, où les auteurs utilisent l'espace-échelle Gaussien pour transformer les images en des signaux sur un domaine des échelles et translations.La généralisation proposée dans cette thèse permet d'utiliser un espace-échelle beaucoup plus général. En particulier, les espaces-échelle morphologiques présentent un avantage quand la seule information disponible sur l'objet d'intérêt est sa géométrie.L'équivariance des opérateurs de sous-échantillonnage et sur-échantillonnage est étudiée et ceux-ci sont appliqués avec les corrélations-croisées d'échelle pour obtenir le SEU-Net, une version de U-Net équivariante par changement d'échelle qui améliore sa généralisation à des échelles non vues lors de l'entraînement.La méthode du repère mobile est une approche classique pour obtenir des invariants différentiels par l'action d'un groupe de Lie sur une variété. Dans ce travail de thèse, l'approche a été appliquée à la construction d'un réseau de neurones équivariant par rotation et par translation.Le réseau proposé a été testé sur une tâche de classification de chiffres manuscrits tournés, et en dépit des certains problèmes numériques, le réseau a obtenu de bons résultats par rapport à des réseaux équivariants par rotation de taille similaire. Puis, pour éviter les problèmes numériques, un réseau qui utilise un seul repère mobile pour calculer des invariants a été proposé et appliqué pour créer des réseaux équivariants par rotations et translations pour les volumes en 3D. Le réseau a été testé sur un ensemble de bases de données pour la classification de volumes médicaux en faible résolution, et il a obtenu une performance à l'état-d'art dans la plupart des bases de données testées.

Equivariant Deep Learning Based on Scale-Spaces and Moving Frames

Apprentissage Profond Équivariant Basé sur les Espaces d’Échelle et les Repères Mobiles

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Relations

Exporter

Collections

Partager