Scalable ETL Pipelines for Autonomous Vehicle Sensor Data Management

Authors

  • Naresh Chandra Mehrotra Visva-Bharati University, Santiniketan, India Author

DOI:

https://doi.org/10.21590/

Keywords:

Autonomous Vehicles, Sensor Data Management, ETL Pipeline, Scalable Data Processing, Cloud-Native Architecture, Real-Time Data Ingestion, Multi-Modal Data Fusion, Distributed Computing, Data Lake, Anomaly Detection

Abstract

Autonomous vehicles generate massive volumes of heterogeneous sensor data, including LiDAR, radar, cameras, GPS, and inertial measurement units, necessitating efficient data management pipelines to extract actionable insights. This paper presents a scalable Extract, Transform, Load (ETL) pipeline designed specifically for autonomous vehicle sensor data management, enabling real-time ingestion, processing, and storage of multi-modal data streams. Leveraging cloud-native architectures and distributed computing frameworks, the proposed ETL pipeline facilitates seamless integration of diverse sensor inputs, data cleansing, feature extraction, and efficient storage in data lakes and warehouses optimized for large-scale analysis. The pipeline addresses critical challenges such as data heterogeneity, synchronization, quality assurance, and low-latency requirements essential for autonomous driving applications. Experimental evaluations using real-world autonomous driving datasets demonstrate the pipeline’s ability to scale horizontally while maintaining high throughput and low latency. Key components include parallelized data ingestion, schema-aware transformation modules, and fault-tolerant streaming capabilities, which collectively ensure robustness and adaptability in dynamic driving environments. The pipeline’s modular design allows easy incorporation of advanced analytics and machine learning workflows downstream, facilitating continuous model training and validation. This approach not only optimizes resource utilization but also supports real-time monitoring and anomaly detection for vehicle sensor health. The proposed system represents a significant advancement in managing the growing complexity and volume of autonomous vehicle sensor data, providing a foundation for improved decision-making and system safety. Future directions include integrating edge computing for pre-processing and further enhancing pipeline automation. This work contributes to the development of scalable data infrastructure critical for accelerating autonomous vehicle research and deployment.

Downloads

Published

2025-09-13

Similar Articles

1-10 of 126

You may also start an advanced similarity search for this article.