Thursday, 2 June 2022

Difference between Dedicated SQL Pool, Serverless Pool and Apache Spark Pool

	Dedicated SQL Pool	Serverless SQL Pool	Apache Spark pool
Architecture	It uses a node-based architecture where applications can connect and issue T-SQL to a control node. That control node is the single point of entry in Synapse SQL. Architecture It is build on a MPP engine, which stands for Massive Parallel Processing engine. Scaling can be done independently because the compute power and storage are decoupled. The control node on top is the brain of the MPP engine. It coordinates all compute activity.	It’s a node-based design based on the DDP system, or Distributed Data Processing system. In this system query’s are split into smaller query’s, and executed on the compute nodes. The control node on top utilized the distributed query engine to optimize the query’s for parallel processing. Each small query is called a “task” and represents a distributed execution unit.	Apache Spark is a parallel processing framework that supports in-memory processing to boost the performance of big-data analytic applications. Apache Spark provides primitives for in-memory cluster computing. A Spark job can load and cache data into memory and query it repeatedly. In-memory computing is much faster than disk-based applications.
Single source of truth	Yes	No
Scale up and down/Compute	Manual -Yes	Autoscale - No as it is managed by microsoft	Autoscale
Use cases	Power BI Direct query When you need a consistent level of performance and cost	To explore the data in ad-hoc modus You can analyze CSV, Parquet, Delta and other file formats stored in Azure Storage by using simple T-SQL query’s. Build a Logical Data Warehouse
		“Pay-per-query” or in other words you only pay for the amount of data that is processed.
Payment Model	Cost per hour (DWU) Charges are made for the resources reserved	Cost per TB of data processed Charges are made for the data processed on each query.
Data Retrieval	Fast: Distribution Strategy’s can be implemented (Round-robin, Hash, …), Query Caching.	Mediocre: Results cannot be cached, data needs to be fetched from storage.
Ingest data back to Datelake	Yes	No – you can only query data lake
Data storage	Relational database	Data lake

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)