The first-mile of Microsoft Fabric Data Engineering and why it matters

Microsoft Fabric Data

Introduction: Fabric’s promise, and its first-mile gap

Microsoft Fabric has made a big splash as the “all-in-one” platform for analytics, BI, and AI. The idea is powerful: it brings together storage, compute, and intelligence into a unified experience, letting analysts, data scientists, and business users all work from the same foundation. The vision is clear and compelling, no more silos, just seamless collaboration.

But here’s the catch: Fabric is only as strong as the data it’s built on. And this is where many organizations run into problems. The “first mile” of data replication, the critical step of moving data from source systems into Fabric, often ends up being a stumbling block.

Data engineers frequently find themselves spending more time building and maintaining pipelines than enabling valuable insights. This results in common patterns like batch copy jobs, PySpark notebooks, and layered ETL pipelines that take data through various stages, bronze, silver, and gold, before it’s ready to be used. While these methods can work, they turn every new data source into a mini engineering project, and the time-to-value slows to a crawl.

At DBSync, we see this differently. Replication should come first. Instead of taking the long detour through multiple engineering steps, data should be analytics-ready from the moment it arrives in Fabric. That’s why we’ve built direct replication paths straight into Fabric’s Warehouse and SQL Database, both natively backed by OneLake, allowing data to land structured, usable, and query-ready right from day one.

By tackling that “first mile” with a clean, efficient solution, organizations can stop spending valuable time moving data around and start using it to drive insights and business value.

The industry default: Data dumped into OneLake

For most organizations adopting Microsoft Fabric today, the starting point seems obvious, just land everything in OneLake as raw parquet files. Whether it’s through Fabric’s built-in data pipelines, third-party ingestion tools, or homegrown scripts, this pattern has become the standard “first mile” approach.

On paper, it sounds like a solid plan: cheap storage, a centralized data hub, and easy access for all Fabric services. But once teams start implementing it, the cracks begin to show.

Raw data isn’t ready for analytics. Teams quickly realize that those parquet dumps can’t be queried or visualized directly. To make the data usable, they need to build layers of Lakehouses, Spark jobs, or dataflows just to reshape it into something relational. That’s a lot of extra work before anyone can even open Power BI.

Schema drift adds another layer of complexity. Systems like Salesforce, HubSpot, or ERP apps evolve constantly, new fields appear, columns change names, and APIs shift. One unexpected change can break an entire pipeline, sending engineers into firefighting mode for hours or even days.

Then there’s latency. By the time data makes its way from source systems to OneLake, and then through all the bronze-silver-gold transformations, it’s often hours or days out of date. For sales or operational reporting, that delay makes the data far less useful.

And perhaps the biggest pain point, fragility. Ask any engineer in a Fabric forum or on Stack Overflow, and you’ll hear the same story: more time spent babysitting brittle copy jobs than actually improving the data stack.

The result? BI and AI teams end up waiting for data that’s never quite ready, while engineers spend their days patching pipelines instead of enabling insights. What should be a unified, intelligent platform starts to feel like just another layer of ETL overhead.

DBSync’s approach: Replication-first Into Fabric Warehouse & SQL DB

Unlike traditional ETL processes that rely on raw data dumps and complex transformations later, DBSync follows a replication-first philosophy. That means instead of pushing data into unstructured storage and reshaping it downstream, DBSync writes operational data directly into Fabric’s analytical and transactional engines, Fabric Warehouse and SQL Database.

Because both Fabric Warehouse and SQL Database are natively backed by OneLake, data physically lands in OneLake in Delta format, but it arrives already structured and relational, ready for SQL queries, Power BI, or AI workloads. This is the key difference: DBSync doesn’t skip OneLake; it simply reaches it through Fabric’s query engines, so the data is usable from the moment it lands.

microsoft fabric replication

How DBSync moves data Into Microsoft Fabric OneLake: Architecture blueprint

To really understand how DBSync simplifies the “first mile” of data movement, here’s how the end-to-end architecture flows, from source systems all the way into Microsoft Fabric and OneLake.

1. Source Systems (CRM)

It all begins at the source, your CRM platforms like Microsoft Dynamics 365 CE, Salesforce, or HubSpot. These systems hold business-critical data such as Accounts, Contacts, Opportunities, and Cases, which DBSync accesses through OData APIs or its native connectors. From here, the goal is simple, get that data into Fabric in a form that’s analytics-ready.

2. DBSync Replication Layer

This is the heart of the process, where DBSync takes over.

Data extraction and load jobs:

DBSync connects directly to CRM sources like Salesforce and Dynamics 365 via native APIs or bulk endpoints to pull entity-level data efficiently. It then writes those records into Microsoft Fabric Warehouse or SQL Database using standard JDBC or REST connections. The process runs in parallel across multiple worker threads, orchestrated by DBSync’s proprietary Worker Manager, which ensures smooth, high-throughput data loads with complete transaction safety.

Transformation Stage:

As the data moves, DBSync handles normalization, column mapping, timestamp enrichment (like lastModifiedDate), and SQL error handling automatically.

Replication Modes:

  • Initial Load: Performs a full extraction and bulk load into Fabric Warehouse tables.
  • Incremental (CDC): Captures only changes using change tracking or modified date filters, keeping syncs lightweight and fast.

3. Fabric Storage Targets

Once data enters Microsoft Fabric through DBSync, it can flow into two primary destinations, each optimized for different workloads.

  • Fabric Data Warehouse
    Fabric’s analytical powerhouse, a columnar, SQL-based engine built for BI and large-scale reporting.
  • Data lands in Delta format inside OneLake, ensuring high performance and open compatibility.
  • Large CRM tables are automatically partitioned for faster queries.
  • Accessible via T-SQL, Power BI DirectLake, and Spark SQL, enabling BI, data science, and AI teams to work from the same source.
  • Fabric SQL Database
    For operational or near real-time use cases, DBSync writes directly into Fabric’s rowstore SQL Database.
  • Ideal for mirroring transactional data or supporting live analytics.
  • Shares the same OneLake foundation, which provides unified governance, security, and consistency. 

4. Unified OneLake storage

At the core of it all is OneLake, Fabric’s unified data layer. Every Fabric service (Warehouse, SQL DB, Lakehouse) writes into OneLake’s Delta-based open storage, meaning the same data is instantly available to all workloads.

Key capabilities include:

  • Centralized governance via Microsoft Purview, full lineage, access control, and sensitivity labeling.
  • Open Delta format (Parquet + transaction log) for interoperability across tools.
  • Shortcuts to ADLS Gen2, S3, and Dataverse, extending Fabric’s reach beyond internal data.
  • Direct URI access (abfss://{workspace}@onelake.dfs.fabric.microsoft.com/{path}) for developers.

In short, OneLake makes Fabric act like a data operating system, one governed, open layer powering every analytic experience.

5. Downstream Consumption

Once DBSync replication is complete, the data becomes immediately usable across Fabric workloads:

ConsumerModePurpose
Power BIDirectLakeQuery live from OneLake, no scheduled refreshes (performance depends on dataset complexity and Fabric compute).
Fabric Data FactoryPipelinesTransform, schedule, or route CRM data.
Spark NotebooksUnified Lake AccessTrain ML models on historical CRM data directly.
KQL DB / Copilot AIReal-Time / SemanticEnable near-live dashboards and natural language insights.

With everything stored once and accessible everywhere, teams move from replication to insight without managing multiple pipelines.

Key advantages

One of the biggest advantages of DBSync’s approach is how it simplifies the entire data integration process. You’re not spending weeks configuring connections or manually mapping schemas. With native connectors for popular systems like Salesforce, Dynamics 365, and HubSpot, you can start syncing key business data in just minutes, no complex setup required.

But where DBSync really stands out is in how it manages change and consistency.

We know that data evolves, systems change, fields get updated, and things move fast. That’s where DBSync’s schema drift protection comes into play. It continuously monitors schema changes at the source and offers flexible handling options:

  • Auto-apply updates downstream
  • Stage for manual approval
  • Or simply alert the engineer before applying any change. This ensures stability and control without pipeline surprises.

DBSync also maintains transactional consistency across dependent entities. It processes changes in the correct sequence using ordered upserts and checkpoints, ensuring referential integrity, so related records (like Accounts and Opportunities) stay aligned even during high-volume syncs.

And if that wasn’t enough, DBSync includes built-in resilience to keep pipelines running smoothly during heavy load or API throttling. Features like automatic retries, checkpointing, and intelligent rate management ensure consistent performance, no matter how large or dynamic your data.

By the time the data lands in your Fabric environment, it’s analytics-ready: fully structured, relational, and reliable.

In a nutshell, DBSync replaces the old “dump and fix later” model with “replicate clean, use immediately.” It delivers consistent, query-ready data right where you need it, without the firefighting.

Microsoft Fabric Workflow

Why OneLake as the foundation

CategoryAdvantage
Unified StorageAll replicated data lands once in OneLake, accessible to every workload.
Open FormatDelta format ensures no vendor lock-in and seamless multi-engine compatibility.
Security & GovernanceManaged lineage, RBAC, encryption, and Purview integration.
PerformanceAuto-optimized partitions and caching for fast query access.
Cost EfficiencySingle storage layer minimizes duplication; incremental replication reduces compute overhead.
Near-Real-Time BICDC + DirectLake deliver dashboards that update within minutes of source changes.

What makes this architecture so powerful

1. Unified Storage, No Silos

DBSync eliminates redundant copies by writing through Fabric’s Warehouse and SQL DB directly into OneLake. This “write once, read anywhere” design ensures that Power BI, Spark, and AI tools all query the same governed, up-to-date dataset.

2. Open Format & Multi-Engine Access

Data written by DBSync arrives in Delta format, instantly queryable by SQL, transformable by Spark, and consumable by Power BI, without extra ETL steps.

3. Scalable, Governed, and Secure

Purview, RBAC, and Fabric’s native governance manage compliance and access automatically. DBSync preserves schema and referential consistency end-to-end.

4. Cost and Maintenance Efficiency

OneLake’s shared storage removes redundant compute/storage layers, reducing operational costs. Incremental replication minimizes load volumes, lowering compute usage and improving efficiency.

5. Near-Real-Time Insights

By combining DBSync’s CDC with Fabric’s DirectLake, dashboards update quickly as CRM data changes. Actual responsiveness depends on data volume and Fabric capacity, but in most cases, changes reflect within minutes, enabling operational agility.

Use cases where this approach shines

1. Modernize BI

Replace fragile CSV exports and Excel stitching with an automated flow:
CRM/ERP → Fabric Warehouse → Power BI
Analysts get live dashboards instead of nightly refreshes or spreadsheets.

2. Train AI Models

Data scientists can directly feed clean, historical data into Fabric ML without spinning up Spark jobs, focusing on modeling instead of wrangling.

3. Real-Time Dashboards

CDC streams landing in Fabric Warehouse keep dashboards fresh, ideal for sales, ops, or finance use cases.

4. Offload Production Systems

Create near-real-time replicas in Fabric SQL DB to run analytics without impacting transactional databases.

Example: How a Change in Salesforce Flows into Fabric

To illustrate how DBSync’s replication-first process works end-to-end, here’s what happens when a record changes in Salesforce:

salesforce to fabric

1) Salesforce record updated (LastModifiedDate = X)

2) DBSync detects change using Salesforce’s Change Tracking API

3) DBSync normalizes fields, enriches with timestamps and metadata

4) DBSync performs an ordered batch upsert to Fabric SQL Database endpoint

5) Fabric writes the data in Delta format to OneLake

6) Power BI DirectLake and other Fabric workloads query the updated record instantly

Result:
Every update in Salesforce flows cleanly through DBSync into Fabric, no manual refreshes, no broken pipelines, and no staging hops. The change is reflected within minutes across Fabric Warehouse, OneLake, and Power BI dashboards, keeping analytics and operations fully aligned.

Comparison the DBSync approach with generic ingestion

FeatureDBSyncGeneric OneLake Dump
TargetFabric Warehouse / SQL DB (query-ready)OneLake raw parquet (needs ETL)
IntegrityOrdered upserts preserve relationshipsRisk of partial updates
Schema DriftAuto-detect + configurable propagationManual fixes required
SourcesNative CRM connectorsDIY scripts or API jobs
Ops OverheadLow – automated monitoringHigh – manual maintenance

How DBSync compliments Fabric Mirroring

Microsoft Fabric offers multiple ways to bring data into OneLake, and DBSync is designed to complement, not compete with, Fabric Mirroring. Here’s a simple guide to choosing between the two:

ScenarioUse CaseRecommended Solution
Native database mirroringSQL Server, Azure SQL DB, Snowflake (supported natively by Fabric)Fabric Mirroring, best for databases Fabric supports out of the box
SaaS applicationsSalesforce, HubSpot, Dynamics 365 CE, NetSuite, Business CentralDBSync, prebuilt SaaS connectors replicate data directly into Fabric Warehouse / SQL DB
Hybrid or on-prem systemsLegacy ERPs, on-prem SQL servers, or mixed environmentsDBSync, handles hybrid replication with CDC and minimal infrastructure
Custom operational data modelsWorkflows requiring schema control, metadata mapping, or incremental sync logicDBSync, offers configurable transformations and schema-aware management

In short:

  1. Use Fabric Mirroring when Fabric already supports your database.
  2. Use DBSync when you’re dealing with SaaS sources, hybrid systems, or any environment where Fabric doesn’t provide a native path, and you still want data to land analytics-ready in Warehouse or SQL DB.

Strategic implications: Who DBSync helps (and how)

  • For Data Engineers:
    Fewer brittle pipelines. Schema-aware CDC and connectors replace fragile notebooks and copy jobs.
    Spend less time patching, more time building.
  • For BI Developers / Analysts:
    A single, trusted source of truth in Fabric. Dashboards refresh faster, with consistent data.
    No more conflicting metrics across departments.
  • For Data Architects / IT Leaders:
    A cleaner stack, lower infrastructure costs, and predictable governance.
    Simplify the data ecosystem while improving compliance and performance.

The bigger picture

DBSync bridges the “first mile” gap, turning Microsoft Fabric from a promising vision into a daily operational reality. By connecting business systems directly to Fabric’s analytics engines, DBSync ensures that data lands clean, consistent, and analytics-ready from day one, empowering teams across engineering, analytics, and leadership.

Microsoft Fabric delivers on its vision only when the first mile of data movement is seamless. DBSync makes that happen, replicating clean, consistent, analytics-ready data directly into Fabric’s Warehouse and SQL Database, powered by OneLake. No brittle pipelines. No refresh delays. Just reliable, governed data flowing from CRM to Power BI and beyond.

Start your replication-first journey with DBSync today and unlock the full potential of Microsoft Fabric.

Kushal K

Kushal Khandelwal is a Product Marketing Manager at DBSync, an integration platform, and has been featured in webinars and events focused on data warehousing, data pipelines, and Snowflake.