Advancing Healthcare: Leveraging Microsoft Fabric and Fivetran for Streamlined Data Integration and Enhanced Analytics in a Complex Ecosystem

ThoughthsWin Systems are excited to provide an overview of a recent project where we realized significant benefits in delivering advanced analytics in healthcare, through the combined use of the Fivetran data integration platform and Microsoft’s Fabric analytics solution.
Background

In the intricate and vital realm of healthcare, a colossal amount of data is generated daily from diverse sources including electronic health records, medical devices, laboratory tests, insurance claims, and patient surveys. This data holds the key to improving patient outcomes, cutting costs, and boosting patient satisfaction. However, the real challenge lies in managing and analyzing this vast array of data, often scattered across various formats and systems.

To address this challenge, an increasing number of healthcare organizations are embracing advanced data platforms. These platforms are designed to streamline and speed up data integration and analysis. By doing so, they aim to fully harness the power of healthcare data, leading to enhanced outcomes for patients, providers, and stakeholders alike, and propelling the healthcare sector into a new era of efficiency and effectiveness.

Data Environment Overview: A Foundation for Advanced Analytics  

ThoughtswinWin Systems’ most recent healthcare analytics project involved analyzing data generated and managed during the radiology scanning process.
The project involved three key sources of information.
  1. Medical imaging metadata on SQL Server:
  • Hosted in Azure SQL Server, eighteen imaging data sets hold a significant amount of data. Five critical tables were identified in this environment rich with medical imaging metadata, vital for diagnostic assessments. The tables contain patient and physician information, detailed study classifications, procedural data, modality types, OEM manufacturer specifics, and precise timestamps of imaging procedures.
  1. Radiology information system in MySQL:
  • The heart of operations resides in MySQL, where eighty-one tables form an extensive database for the radiology information system. These tables represent a comprehensive scope of data – from patient management information and scheduling details to imaging workflow, reporting, radiologist and technician profiles, financial transactions, and audit trails.
  1. Google Sheet – A protected datasheet, containing Radiologists’ AI feedback:
  • For each study performed, a highly secure Google Sheet captures radiologist feedback on how the AI model has performed.
Objectives/Challenges

The following objectives were identified for the project:

  1. Extract and integrate data rapidly from disparate systems, namely, Radiology Information Systems (RIS), Picture Archiving and Communication Systems (PACS), and medical practitioners’ feedback for AI agents stored in Google Sheets
  2. Clean & transform data ready for analysis
  3. Unify KPIs and metrics into a single view for healthcare operations to provide actionable insights, leading to better patient outcomes
  4. Enhance AI model performance and enable responsible and ethical AI implementation by monitoring various model performance metrics
Solution

Fivetran  stands at the core of our solution, providing a robust data integration platform. It enables us to extract and load data from source systems into the Microsoft Fabric Lakehouse for further downstream processes.

Fivetran offers more than 300 connectors, data scrubbing features, integrated scheduling, and pre-built data models. Fivetran is designed to offer organizations the ability to effortlessly extract, load, and transform data between a wide range of sources and destinations.

Microsoft Fabric is an all-in-one analytics solution for enterprises that covers everything from data movement to data science, real-time analytics, and business intelligence. It offers a comprehensive suite of services, including data lakes, data engineering, and data integration, all in one place designed to simplify your analytics needs.

The Fabric platform is built on a foundation of Software as a Service (SaaS), which takes simplicity and integration to a whole new level.

Data Ingestion using Fivetran
  • Automated Data Integration: Fivetran automates the process of data integration from various sources into a centralized data warehouse. This automation reduces the need for manual intervention and lowers the risk of errors.
  • Real-Time Data Replication: The platform supports real-time data replication, ensuring that the data warehouse is always up-to-date. This feature is crucial for businesses that rely on timely data for decision-making.
  • Centralized Data Management: With Fivetran, all data is centralized in a single data warehouse, simplifying data management and analysis. This centralization also aids in maintaining data consistency and integrity.
  • Security and Compliance: Fivetran places a strong emphasis on data security and compliance with various data protection regulations. This focus ensures that sensitive data is handled securely and in compliance with legal requirements.
  • Ease of Use: The platform is user-friendly, with a straightforward setup process. It does not require extensive technical knowledge, making it accessible to a wider range of users within an organization.
Image shows the three connectors are configured and ready to be synced.
Image showing Logs tab within a connector which provide an internal view of operations happening with each of the connectors.
For every schema in the SQL database, Fivetran creates a schema in the OneLake destination that maps directly to its native schema, ensuring the data in the destination is in a familiar format.
Data processing in Microsoft Fabric OneLake
  • Data availability: Source data synced from Fivetran was all immediately available for further consumption in the MS Fabric Lakehouse.
  • Data quality: The ingested data is analyzed for anomalies or missing data which is crucial for the downstream processes. Fabric provides excellent integrated capabilities to do this through OneLake Data Hub.
  • Data analysis: With Fabric’s SQL Analytics we were able to quickly run exploratory data analysis to understand traits, discover patterns, and identify relationships between variables to build a robust data model.
  • Data transformation: The analytical needs of the solution were met by using the integrated Lakehouse and Notebook capabilities to transform and build additional aggregated datasets. These transformed datasets made the downstream analytics and reporting processes simple and efficient.
Image showcases Fabric’s data preview mode and explorer view providing a unified view of the data from different sources.
Benefits realized through the solution: Key features
Fivetran
  • Ingestion efficiency: Automated data ingestion removes low-value-added pipeline building and maintenance tasks from the data engineers’ and developers’ workload. This allows them to work on more critical tasks and innovation. 
  • Increased data availability: Efficient data ingestion through ELT pipelines provides near-instant access, meaning analysts get the latest data to work with. They can apply transformations to fresh data to get more relevant and actionable insights.
  • Rapid data centralization: Facilitates swift and effortless amalgamation of disparate data sources into a unified cloud-based platform, where analysts can apply transformations for processing and analysis. Centralized data provides context so teams can see how each data set serves organizational goals.
  • Data redaction: Column blocking or hashing can exclude sensitive data such as personally identifiable information (PII) from a
Fabric
  • Unified platform: Microsoft Fabric provides a single, unified platform for all your data and analytics needs, which can help reduce complexity and improve efficiency.
  • Comprehensive analytics: Microsoft Fabric provides an extensive range of deeply integrated analytics services, including data warehousing, data exploration, and data visualization.
  • Ease of use: Microsoft Fabric provides a user-friendly interface that simplifies the development process, making it easier for developers to build and deploy applications.
  • Scalability: Microsoft Fabric is designed to scale with your business needs, allowing you to process large amounts of data quickly and efficiently.
  • Lineage view: Lineage relationships can be seen between all the items in a workspace, as well as data sources external to the workspace one-step upstream.
Considerations when using Fivetran to ingest data into MS Fabric Lakehouse:
  • Configure the Fabric OneLake destination:
    • The Microsoft Fabric Admin needs to ensure the Service Principal role created has Contributor Access to the Fabric workspace.
  • Configure the Fivetran pre-built connectors:
    • Ensure that versions of the sources systems are supported.
    • Whitelist the Fivetran IPs in source system firewalls.
  • Configure incremental updates to ensure Fivetran copies only the rows that have changed since the last data sync, to avoid copying the whole table every time. Depending on whether connections are made to the primary instance, or an availability group replica, there may be limitations in what mechanisms are supported.
  • Below are the two methods:
  • Read changes using CT (change tracking) or CDC (change data capture)
  • Detect changes via Teleport Sync
Data visualization leveraging Power BI:

By using Fivetran to ingest data into Fabric, a robust and optimally structured dataset was established. We were then able to use Power BI to build a comprehensive set of reports that would allow the team to gain insights into the effectiveness of the AI models deployed in the solution.

Metrics and KPIs implemented in Power BI reports
No KPIs/Metrics Definitions
1 Scan volume metrics
  • Total number of studies/scans performed
  • Number of studies by modality type and body parts
  • Daily/weekly/monthly/yearly trends in study volumes
  • By clinical centers
2 Patient demographics
  • Distribution of studies by age group
  • Distribution of studies by gender
  • Distribution of studies by clinical centers
3 Scaida detect performance metrics
  • Accuracy, sensitivity, specificity, and precision of AI models for each body part
  • False positive and false negative rates
  • Model performance over time to track improvements or degradations
  • Model performance by OEM scanners (more useful in future)
4 Ethical and responsible AI metrics
  • Discrepancies in model performance across different demographics (age, gender, ethnicity)
The metrics summarized in the table above are described in more detail below.
Scan volume metrics:
  • Enables technicians to monitor and optimize workflow by understanding the total number of studies/scans
  • Assists in resource allocation by highlighting trends in study volumes across different modalities and body parts
  • Offers insights for better planning and management by tracking daily, weekly, monthly, and yearly scan volumes
  • For executives and decision makers it provides data to assess the demand and usage of clinical centers
  • Helps in strategic planning and expansion by identifying trends and growth areas 
Patient demographics:
  • Provides an understanding of patient distribution by age, gender, and clinical center assisting in tailoring not only healthcare services and research initiatives but also in developing a better AI model
  • For healthcare policy makers it helps in resource allocation and policy development to address the specific needs of different demographic groups
Scaida detect performance metrics:
  • For AI Teams and Developers, they help in tracking accuracy, sensitivity, and other model metrics essential for ongoing AI model improvement
  • Identifying false positives and negatives helps in refining the AI algorithms
  • For clinical decision makers: Model performance data is crucial for assessing the reliability and integration of AI in clinical workflows
Ethical and responsible AI metrics
  • Performance equity analysis: Dissecting AI performance data reveals crucial insights into disparities across different patient groups. By pinpointing these discrepancies, we can drive AI towards more equitable outcomes.
  • Bias detection and mitigation: Reports designed to shine a light on potential biases due unbalanced data sets, allowed us to implement proactive measures to mitigate them by retraining the model with such considerations. This is not just about adhering to regulatory norms but about setting a new standard for ethical AI in healthcare.
  • Feedback-driven AI refinement: The feedback loop facilitated by Power BI empowers us to continually refine our AI models. By integrating radiologist and practitioner input, we evolve our AI to be more inclusive, making unbiased healthcare accessible to all.
Conclusion

ThoughtsWin Systems’ collaboration with Fivetran and Microsoft has unveiled a cutting edge approach to harnessing healthcare data. We are paving the way for advancements in medical imaging and patient care by effectively channeling information from diverse systems into a centralized and intelligent analytics platform.

Through this integrated solution, we are not only facilitating a more profound understanding of diagnostic imaging data but also empowering healthcare providers with actionable insights. Our purpose-driven analytics cater to the intricate needs of modern healthcare, ensuring that every scan, every patient demographic, and every AI model metric translates into enhanced care and operational excellence.

Our commitment goes beyond integration; we strive to be the architects of a future where data drives decisions.

Join us on this transformative journey. Connect with our team, schedule a demo, or engage with us in a conversation about how we can elevate your data strategy to the next level. Together, we can unlock the potential of healthcare data and foster an ecosystem where technology and care converge for the greater good.

Contact Mahesh.shankar@Thoughtswinsystems.com today, and let’s turn data into action, insights into outcomes, and challenges into successes.

Embrace the future of healthcare analytics with ThoughtsWin Systems – where data meets decision-making.