This week is my final week of being a work-ready consultant upskilling before finally going to work with my client company. I focused on practicing SQL, finishing my ETL Pipelining course, and completing an introduction to Data Warehousing and Business Intelligence.
ETL Pipelining and Data Modeling
I was able to finish the course I'd started about ETL pipelining. As a review, ETL stands for "Extract, Transform, and Load" data. Data pipelining involves creating a sort of flow or structure for data to run through, to modify data sources.
In this example flow, data is being extracted from the SalesTerritory data source. It runs through a lookup component to look for matches and if doesn't exist, it gets inserted as a new item into another SalesTerritory table. Otherwise, because a match exists, the pipeline checks to see if there are any changes in the details and if so, it will update the SalesTerritory dimension table.
By using this data pipeline, the data engineer can do a variety of operations all at once while minimizing the number of entries in the transaction log; in other words, this optimizes performance and processing time, and packages the information into sources ready to be used elsewhere.
SQL Refresh
After the initial interview with the client, I decided I needed to brush up on my SQL a little bit more. I hadn't really had any practice using CTEs so I created some with the Adventure Works database I had downloaded to Microsoft SQL Server.
I'd then tried to download Visual Studio so I could try some data pipelining, but I wasn't able to find the correct packages. Instead, I drew a model of a pipeline on pen and paper modeling data I generated from what I knew about the target data types I would be working with at the client. After practicing my pipeline formatting, I then decided to break down the pipeline even further and practiced designing a data warehouse.
I expected that the information I would want to query to put into the pipeline must be broken down into many tables, like how the sample data AdventureWorks is organized. There are various tables for people, their associated businesses, and location addresses. By modeling a data warehouse with my generated client data, I'd be able to think through and apply ideas about surrogate keys to the model and practice my SQL queries. In general, this was a good exercise and substitute for not being able to use the right technology at the time.
Data Warehousing and Business Intelligence
I wanted to strengthen my knowledge of data warehousing theory in particular, which is why I found this interesting course on Pluralsight. I can understand, from the other courses, the importance of data modeling and data warehousing, but I wanted a better understanding of how else data warehouses can be created, how they're structured, and what sort of features a data warehouse includes. I am still working through the course but so far most of what the course is teaching is review from data modeling.
Comments