Many companies today have aging data architectures. As you look to modernize your traditional ETL pipeline, there is a tool you should keep in mind: Azure Databricks. During your move into Azure, there will probably be a place for Azure Databricks. In the past, general DTS/SSIS pipelines and SQL Server engines were sufficient but with increasing data sizes and SLAs, they just won’t cut it. Azure Databricks provides an extremely easy platform to get started using the power of Apache Spark.
The traditional migration path for your existing SSIS packages is to Azure Data Factory. Azure Data Factory can handle a lot but when it comes to larger volumes it may be time to look at Azure Databricks. Azure Databricks notebooks can be called directly from Azure Data Factory or scheduled from within Azure Databricks itself.
Collaborative workspaces allow your engineers to begin coding immediately. Proof of concepts are quickly achievable and single node and dynamic clusters are available to keep the costs down while in development.
As you are planning your data pipeline modernization? Reach out to Wintellect / Atmosera and we can help identify, architect, and implement where Azure Databricks fits into your solution. Training is also available.
In this Azure Databricks series of blog posts, we’ll cover the highlights of Databricks and increase your confidence in this amazing tool. The planned topics to cover are:
- The power of Spark
- Collaborative workspaces / IDE
- No need to manage infrastructure or upgrades, Databricks releases db runtimes that keep you on the cutting edge
- Streaming vs Batch
- Built-in source control
- Large community with good documentation
- Lakehouse Architecture
Please feel free to comment with questions or topics you’d like to see covered and we will be sure to respond.