As part of our quarterly review process of all the FAQs, we have decided to touch upon the most frequently talked about subject, ETL vs EAI. In this article we look to address some of the pain points involved in the decision making process between the two, what they mean to your Business and when you should draw boundaries between these two integration techniques.
As part of any data integration strategy, it is a golden rule of thumb to follow ‘prevent and prepare rather than repent and repair’. In most cases concerning Business Intelligence initiatives, if you know what you don’t need, you are already half way through in the quest to what you want to achieve. Business Enterprises invest millions of dollars on BI initiatives and Data Warehousing and hence rely on a consistent, accurate and reliable data.
Thus following proper integration techniques becomes all the more vital. With concerned to enterprise-integration strategy, EAI is essential for maintaining a perfect harmony between business needs and IT solutions where as Data integration, a function of ETL is a constant need as inconsistent data at the heart of any BI initiative fails to provide an accurate picture of the Business. Hence the vital question EAI or ETL?
What is EAI (Enterprise Application Integration)?
EAI may be defined as a process of aligning a business’s strategic vision with its information technology. In other words EAI is mainly process integration that exists to allow client applications to operate on data from across the business through API calls without regard to its location, encapsulating technology or format. It involves integration of incompatible business applications within and beyond enterprise to allow them to talk to each other seamlessly and to share data in real time.
EAI solutions enable the automation of end-to-end business processes by coordinating sequences of tasks and resources (both systems and people) that perform them. EAI solutions support sophisticated exception management and the dynamic modification of processes even when processes are underway. EAI involves developing a ‘unified view’ of an enterprise’s business and its applications, seeing how existing applications fit into the scheme of things, and then devising ways to efficiently reuse what already exists while adding new applications and data.
EAI integration is mostly message-based, transaction-oriented, point-to-hub, brokering and transformation for application-to-application integration. Some of the core benefits offered by Enterprise Application Integration are.
- A focus on integrating both business-level processes and data
- A focus on reuse and distribution of business processes and data
- A focus on simplifying application integration by reducing the amount of detailed, application specific knowledge required by the users
What is ETL (Extraction, Transformation & Loading)?
As the acronym stands, ETL is a process of extraction / reading a database, transforming or validating the data, and lastly loading / writing to a database. Extract, Tranform & Load are three database functions that are combined into one tool (like DBSync) to pull data out of source databases and place it into target databases. ETL is used to migrate data from databases to others, to form data marts and data warehouses and also to convert databases from one format or type to another.
Extract – The process of reading data from source systems. Data can be extracted in schedule-driven pull mode or event–driven push mode. Pull mode operation supports data consolidation and is typically done in batch. Push mode operation is one online by propagating data changes to target data stores.
Transform – the process of converting the extracted data from its existing form into the format it needs to be in so that it can be placed into other systems or databases. Transformation occurs by using rules or lookup tables or by combining the data with other data.
Load – The process of creation and execution of workflows to write data into the target systems. Data loading may cause a complete refresh of a target data store or may be done by updating the target destination. Interfaces here include de facto standards like ODBC, JBDC, JMS, or application interfaces.
ETL is mainly designed to process very large amounts of data, ETL provides a suitable platform for
- Improved productivity by reuse of objects and transformations
- Strict Methodology
- Better Metadata Support, including impact analysis.
Drawing Boundaries for ETL & EAI
As mentioned in the above sections, both techniques rely on the concept of a unified view and the definition of a mapping that allows data from many disparate sources to be “projected” onto that view. But what differ are the purpose, speed and direction and amount of data that are transformed and placed within the unified view from the external sources.
Process-level integration mainly deals with building enterprise-wide business workflows and processes and incorporating existing applications into those processes. EAI middleware acts as the workflow engine integrating applications in near real time, passing small amounts of data through message queues and a series of stages. EAI tools provide much more complete workflow capabilities than ETL tools, which provide simple workflow. EAI tools, and especially their workflow components, provide very sophisticated GUI development environments that enable design and management of very complex business processes. Here transformations are focused on ensuring a common understanding of the context and meaning (semantics) of the data involved within the message, the more likely a proven EAI is more appropriate.
With regards to data integration, ETL tools and also next generation ETL clearly holds an advantage whether in batch or real time. Synchronizing data between two applications involves a lot more data manipulation than simply moving data from point Source to Target. It involves data intense tasks that depend upon either RDBMS efficiencies/scalability or in-memory data caching to achieve the necessary throughput. Typically, enterprise data warehousing projects require you to move large amounts of data within relatively small windows of time. Also since ETL tools were born out of the relational database world, and thus are adept at performing SQL oriented transformations. They deal mainly with pulling data out of multiple relational tables, understanding the meaning and relationships between the tables, combining, merging, or joining that data, and augmenting it with data from their sources. Since this is more than just moving data from Source to Target, ETL is more appropriate.
|When EAI||When ETL|
|High number of transactions||Large amounts of data|
|Message transforms||Complex transformations|
|Transformation act on a single row of data||Transformation is data-set oriented|
|Little data augmentation||Large data augmentation|
|1 to n ; m to n||Point-to-point|
|Suitable for real-time data needs||Suitable for large amounts of data|
|High volume, low footprint data exchange||Generally used to move data between 2 or more databases / data repositories|
Watch DBSync Demo of of Salesforce.com & Microsoft Dynamics GP: http://www.mydbsync.com/connect?c=009_sf_gp_demo/index.html