Robust Underwater Perception: Using Multimodal and 3D Visual Cues to Boost Machine Learning Frameworks in Marine Applications

  • Underwater robots need reliable perception for navigation, mapping, diver interaction, and manipulation, yet vision is degraded by wavelength-dependent attenuation, scattering, and variable water optics. These effects reduce contrast, distort color, and destabilize visual cues, so perception must be tailored to underwater image formation and field reliability constraints. This thesis develops multimodal, 3D-aware perception for adverse marine and deep-sea conditions, based on experiments and integration within the EU projects MORPH, CADDY, and DexROV. By combining complementary sensors (2D imagery, stereo 3D structure, inertial and acoustic cues) with learning pipelines, the approaches compensate for individual sensor weaknesses. First, it enriches 2D perception with 3D context and underwater-specific enhancement. Contributions include terrain-complexity estimation from texture metrics and stereo geometry to adapt AUV speed during surveys, plus color restoration/image enhancement to improve detection and pose estimation. For human-robot interaction, it introduces diver detection and pose estimation that merge stereo point-cloud descriptors with recurrent neural networks to handle low-contrast imagery. Second, it presents end-to-end systems, including the CADDY underwater stereo-vision dataset for gesture-based communication and a gesture-recognition pipeline that blends classical learning, deep detectors, and a grammar-guided human-in-the-loop design for safer diver and AUV communication. Finally, for deep-sea intervention, it proposes a simulation-in-the-loop validation to reduce sim-to-real gaps and an adaptive localization framework fusing dense 3D reconstruction, planar geometry, image-quality cues, and visual odometry to maintain accurate navigation in low visibility. The methods are validated on real data and integrated into autonomous demonstrators for safety-critical missions during field trials.

Download full text

Cite this publication

  • Export Bibtex
  • Export RIS

Citable URL (?):

Search for this publication

Search Google Scholar Search Catalog of German National Library Search OCLC WorldCat Search Bielefeld Academic Search Engine
Meta data
Publishing Institution:IRC-Library, Information Resource Center der Constructor University
Granting Institution:Constructor Univ.
Author:Arturo Gomez Chavez
Referee:Andreas Birk, Francesco Maurelli, Nikola Miskovic
Advisor:Andreas Birk
Persistent Identifier (URN):urn:nbn:de:gbv:579-opus-1013450
Document Type:PhD Thesis
Language:English
Date of Successful Oral Defense:2025/12/19
Date of First Publication:2026/01/29
PhD Degree:Computer Science
Academic Department:School of Computer Science and Engineering
Other Countries Involved:Croatia
Call No:2025/21

$Rev: 13581 $