You can't schedule refresh for this dataset because one or more sources currently don't support refresh

Issue –

You can't schedule refresh for this dataset because one or more sources currently don't support refresh

Resolution –

Downloaded the report and opened in Power BI desktop and then click on report – Refresh happens

Published this report and replaced the old report

Difference between Azure Analysis services and Azure Synapse

	Azure Analysis Services	Azure Synapse
Upgraded version	Azure Analysis Service is upgraded version of SQL server Analysis Service on the cloud.	Azure Synapse Analytic is upgraded version SQL DW + ML +Power BI that intergrates with ADLS + Data Bricks.
Concurrent users	More than 1000	128
DAX query support	Yes	No
Semantic model	Yes	Yes – Through Power BI model in Synapse Analytics
Dashboard development	Power BI	Power BI data set
Datamining	No	Yes
Power BI Connection type	Live connection	Direct/Import mode

Difference between Azure Data Lake and Delta Lake

	Azure Data Lake	Azure Delta Lake
Definition	Data Lake Storage Gen2 allows you to easily manage massive amounts of data. A fundamental part of Data Lake Storage Gen2 is the addition of a hierarchical namespace to Blob storage. The hierarchical namespace organizes objects/files into a hierarchy of directories for efficient data access.	Delta lake is an open-source storage layer from Spark which runs on top of an Azure Data Lake. Its core functionalities bring reliability to the big data lakes by ensuring data integrity with ACID transactions
Data format	All the raw data coming from different sources can be stored in an Azure Data Lake without pre-defining a schema for it. Azure Data Lake can contain all types of data from many different sources without any need to processing it first.
Data integrity	Azure Data Lake usually has multiple data pipelines reading and writing data concurrently. It's hard to keep data integrity due to how big data pipelines work
ACID transactions	If a pipeline fails while writing to a data lake it causes the data to be partially written or corrupted which highly affects the data quality. Append only writes	Delta is ACID compliant which means that we can guarantee that a write operation either finishes completely or not at all which avoids corrupted data to be written
Unified batch and stream sources and sinks	There is no possibility to have concurrent jobs reading and writing from/to the same data.	With Delta, the same functions can be applied to both batch and streaming data and with any change in the business logic we can guarantee that the data is consistent in both sinks.
Schema enforcement & Schema evolution	The Incoming data can change over time. In a Data Lake this can result in data type compatibility issues, corrupted data entering your data lake etc.	With Delta, a different schema in incoming data can be prevented from entering the table to avoid corrupting the data. If enforcement isn’t needed, users can easily change the schema of the data to intentionally adapt to the data changing over time
Time Travel	In a Data lake, data is constantly modified so if a data scientist wants to reproduce an experiment with the same parameters from a week ago it would not be possible unless data is copied multiple times	With Delta, users can go back to an older version of data for experiment reproduction, fixing wrong updates/deletes or other transformations that resulted in bad data, audit data etc.
Prevent Data corruption		A Delta Lake table may include NOT NULL constraints on columns, which cannot be enforced on a regular Parquet table. This prevents records from being loaded with NULL values for columns which require data By using a MERGE statement, a pipeline can be configured to INSERT a new record or ignore records that are already present in the Delta Table.
Query execution	An expensive LIST operation on the blob storage for each query	Delta transaction log serves as the manifest. The transaction log not only keeps track of the Parquet filenames but also centralizes their statistics. These are the min and max values of each column that is found in the Parquet file footers. This allows Delta Lake to skip the ingestion of files if it can determine that they do not match the query predicate.
Processing engine	Apache Spark	Delta engine - It has an intelligent caching layer to cache data on the SSD/NVME drives of a cluster as it is ingested; thereby making subsequent queries on the same data faster. It has an enhanced query optimizer to speed up common query patterns. Photon, a native vectorization engine written in C++. it’s optimized with performance features like indexing, Delta Lake customers have seen ETL workloads execute up to 48% faster.
Achieve Compliance - New laws such as GDPR and CCPA require that companies be able to purge data pertaining to a customer should a request by the individual be made.		Delta Lake includes DELETE and UPDATE actions for the easy manipulation of data in a table.

Difference between Dedicated SQL Pool, Serverless Pool and Apache Spark Pool

	Dedicated SQL Pool	Serverless SQL Pool	Apache Spark pool
Architecture	It uses a node-based architecture where applications can connect and issue T-SQL to a control node. That control node is the single point of entry in Synapse SQL. Architecture It is build on a MPP engine, which stands for Massive Parallel Processing engine. Scaling can be done independently because the compute power and storage are decoupled. The control node on top is the brain of the MPP engine. It coordinates all compute activity.	It’s a node-based design based on the DDP system, or Distributed Data Processing system. In this system query’s are split into smaller query’s, and executed on the compute nodes. The control node on top utilized the distributed query engine to optimize the query’s for parallel processing. Each small query is called a “task” and represents a distributed execution unit.	Apache Spark is a parallel processing framework that supports in-memory processing to boost the performance of big-data analytic applications. Apache Spark provides primitives for in-memory cluster computing. A Spark job can load and cache data into memory and query it repeatedly. In-memory computing is much faster than disk-based applications.
Single source of truth	Yes	No
Scale up and down/Compute	Manual -Yes	Autoscale - No as it is managed by microsoft	Autoscale
Use cases	Power BI Direct query When you need a consistent level of performance and cost	To explore the data in ad-hoc modus You can analyze CSV, Parquet, Delta and other file formats stored in Azure Storage by using simple T-SQL query’s. Build a Logical Data Warehouse
		“Pay-per-query” or in other words you only pay for the amount of data that is processed.
Payment Model	Cost per hour (DWU) Charges are made for the resources reserved	Cost per TB of data processed Charges are made for the data processed on each query.
Data Retrieval	Fast: Distribution Strategy’s can be implemented (Round-robin, Hash, …), Query Caching.	Mediocre: Results cannot be cached, data needs to be fetched from storage.
Ingest data back to Datelake	Yes	No – you can only query data lake
Data storage	Relational database	Data lake

Purely SQL

Thursday, 2 June 2022

You can't schedule refresh for this dataset because one or more sources currently don't support refresh

Issue –

Resolution –

Difference between Azure Analysis services and Azure Synapse

Difference between Azure Analysis services and Azure Synapse

Difference between Azure Data Lake and Delta Lake

Difference between Azure Data Lake and Delta Lake

Difference between Dedicated SQL Pool, Serverless Pool and Apache Spark Pool