Difference
between Dedicated SQL Pool, Serverless Pool and Apache Spark Pool
|
Dedicated SQL Pool |
Serverless SQL Pool |
Apache Spark pool |
Architecture |
It uses a node-based architecture
where applications can connect and issue T-SQL to a control node. That
control node is the single point of entry in Synapse SQL. Architecture The control node on top is the brain
of the MPP engine. It coordinates all compute activity. |
It’s a node-based design based on
the DDP system, or Distributed Data Processing system. In this system query’s
are split into smaller query’s, and executed on the compute nodes. The
control node on top utilized the distributed query engine to optimize the query’s
for parallel processing. Each small query is called a “task” and represents a
distributed execution unit. |
Apache Spark is a parallel
processing framework that supports in-memory processing to boost the
performance of big-data analytic applications. |
Single source of truth |
Yes |
No |
|
Scale up and down/Compute |
Manual -Yes |
Autoscale - No as it is managed by
microsoft |
Autoscale |
Use cases |
Power BI Direct query |
To explore the data in ad-hoc modus You can analyze CSV, Parquet, Delta
and other file formats stored in Azure Storage by using simple T-SQL query’s. Build a Logical Data Warehouse |
|
|
|
“Pay-per-query” or in other words
you only pay for the amount of data that is processed. |
|
Payment Model |
Cost per hour (DWU) |
Cost per TB of data processed |
|
Data Retrieval |
Fast: Distribution Strategy’s can be
implemented (Round-robin, Hash, …), Query Caching. |
Mediocre: Results cannot be cached,
data needs to be fetched from storage. |
|
Ingest data back to Datelake |
Yes |
No – you can only query data lake |
|
Data storage |
Relational database |
Data lake |
|
No comments:
Post a Comment