Enhancing Medical Imaging Diagnosis with Databricks: ThoughtsWin Systems' Success Story

In the field of medical imaging diagnosis, leveraging cutting-edge AI/ML technology is becoming paramount to improving diagnostic accuracy and efficiency. ThoughtsWin Systems, in collaboration with Databricks and mlHealth 360, is pioneering a transformative approach to medical imaging. By harnessing the robust capabilities of the Databricks platform, we aim to revolutionize the processing, analysis, and interpretation of medical images. This blog post explores how ThoughtsWin Systems as an implementation partner helped mlHealth 360 to leverage Databricks’ Data Intelligence Platform in medical imaging diagnosis and the specific benefits observed from this collaboration.
Medical imaging diagnosis is a critical component of modern healthcare, enabling the visualization of internal body structures to detect, diagnose, and monitor diseases. Traditional methods of processing and analyzing medical images often face challenges related to data volume, the need for quick turnaround times, and the requirement for high accuracy. To address these challenges, advanced technologies such as artificial intelligence (AI) and machine learning (ML) are increasingly being integrated into medical imaging.  Our project scope involved a dataset comprising nearly 60,000 CT scans, which include 20 million individual image slices, with a total data volume of 54 terabytes. Each image slice requires multistep processing including anonymization. These medical images acquired from mlHealth 360’s partner medical institutes needed to be processed for training an ML model that assists in diagnosis. Handling such a vast amount of sensitive data necessitates a secure, high-performance infrastructure and streamlined data management capabilities to ensure timely and accurate results. 

Improve Diagnostic Accuracy
Utilize advanced AI and ML models trained on extensive datasets to enhance the precision of medical image analysis, despite variations in image quality and scanner types. 

Enhance Processing Efficiency
Implement a scalable and high-performance architecture capable of ingesting, processing, and storing a continuous stream of incoming data without delay. This is essential to manage the sheer volume and velocity of data, with patient scans arriving every few seconds. 

Enable near Real-Time Analysis
Facilitate real-time data processing to provide radiologists with timely results for review. This is particularly critical during peak hours when scans accumulate in queues, necessitating rapid processing to avoid delays in patient care.

Streamline Medical Imaging Workflow
Optimize each stage of the complex medical imaging workflow, from preprocessing raw DICOM (Digital Imaging and Communications in Medicine, the standard for the communication and management of medical imaging information) data to running computationally intensive inference models and saving results back into DICOM storage. Efficiently manage the computational resources required for these tasks to control costs.

Ensure Compliance and Security
Maintain the security and privacy of patient data to comply with stringent healthcare regulations. Implement robust security measures throughout the workflow to mitigate the risk of data breaches or unauthorized access.

Adapt to Ongoing Challenges
Continuously update and retrain ML models to adapt to evolving medical knowledge and emerging pathologies. Address data heterogeneity to ensure model accuracy across various image qualities and scanner types.  

 

Databricks forms the heart of our solution, providing a robust data intelligence platform. This solution leverages advanced data engineering and machine learning capabilities to enhance diagnostic accuracy, processing efficiency, and effective data management. 

The architecture diagram illustrates our implementation: 

1. Data Ingestion and Storage: 

• CT scans and associated metadata are ingested and stored in Azure Blob Storage and SQL Server respectively. This setup ensures that all incoming data is efficiently captured and securely stored for further processing. 
• Azure Blob Storage and SQL Server are used for their robust storage solutions, providing scalability and security for large volumes of medical imaging data. 

2.Data Preprocessing: 

• The ingested data is preprocessed to prepare it for model training and inference. This involves cleaning and normalizing the data, converting it into a format suitable for machine learning algorithms.
Image depicts some of the dicom image preprocessing steps

• Databricks Jobs and Workflows are used to automate the preprocessing steps, ensuring consistency and efficiency in handling the data. 

3. Model Training and Inference: 
• Preprocessed data is used to train proprietary deep learning models, particularly Convolutional Neural Networks (CNNs) for image segmentation and classification. These models are crucial for accurate diagnosis. 

◦ mlflow Integration: Databricks integrates seamlessly with mlflow to track
experiments, manage models, and facilitate reproducible workflows. 
◦ GPU Utilization: GPU resources are leveraged for computationally intensive tasks, accelerating model training and inference. 

4. Metrics and Monitoring: 
• The performance of the models is monitored continuously to ensure they meet the desired accuracy and efficiency standards. 
• mlflow provides metrics tracking and model versioning, enabling continuous evaluation and improvement of the models. 

5. Results Storage and Access: 
• The predicted results are stored back into SQL Server and made accessible through a user portal. This ensures that radiologists can access and review the diagnostic results promptly. 

• SQL Server for storing results and a user portal interface for easy access and review by healthcare professionals. 

By leveraging the comprehensive suite of Databricks services, including Databricks Job/Workflow, integrated mlflow, and seamless integration with Azure storage solutions, we efficiently streamline the preprocessing of vast quantities of DICOM files and associated annotations. This optimized workflow ensures seamless data transfer and management through Azure Blob Storage, intelligently balancing CPU and GPU resources to tackle complex data processing tasks with ease. 

The Databricks environment significantly accelerates our deep learning model training, particularly for 3D image segmentation and binary classification, resulting in precise diagnostics and expedited results for our valued clients. This integrated approach not only enhances the accuracy of our diagnostics but also ensures that results are delivered in a timely manner, crucial for effective patient care.  

Databricks Jobs and Workflows are used to automate the preprocessing steps, ensuring consistency and efficiency in handling the data. 

The integration of the Databricks platform into medical imaging workflow has yielded significant outcomes. Here are the key results observed from implementing this solution: 

Model training data volumes 
The Databricks platform was leveraged to process a substantial volume of data across various body parts for model training. Below are the detailed counts of 3D scans and 2D images processed:  

Operations metrics
Our use of the Databricks platform also improved our operational metrics, as shown in the table below:

Number of Jobs executed each day: 
The following chart shows the daily execution of jobs, highlighting the platform’s capacity to handle a high volume of tasks efficiently: 

Number of Jobs executed by month: 
This chart illustrates the monthly job execution metrics, reflecting consistent high performance and reliability:  

ThoughtsWin Systems’ partnership with Databricks has empowered mlHealth 360 to manage and process vast volumes of medical imaging data seamlessly, ensuring that proprietary AI and ML models are continuously updated and accurate. The integration of robust data management, real-time processing capabilities, and optimized workflows has not only improved diagnostic outcomes but also ensured compliance with stringent healthcare regulations. Our commitment to innovation and excellence in medical imaging diagnosis is bolstered by the powerful tools and capabilities of Databricks. As we continue to harness the potential of advanced data analytics and AI, we are poised to deliver even greater value to medical professionals and, ultimately, to the patients who depend on accurate and timely medical diagnostics.

Interested in learning more about how ThoughtsWin Systems and Databricks can transform your medical imaging processes and workflows? Contact mahesh.shankar@thoughtswinsystems.com today to discover how our innovative solutions can help you achieve higher diagnostic accuracy and operational efficiency.