This week I was supposed to have started with the client, but I am continuing work-ready activities for a few more days as paperwork gets processed. Thanks to this extra time, I was able to complete this course I'd started last week. But also, because I'm not sure if I need to suddenly start sometime this week, I held off on starting any big projects or other courses, and instead have been practicing my SQL on SQL Server with Adventure works, and following a Python tutorial on ETL Pipelines.
Data Warehousing and Business Intelligence
Not only did I gain a better understanding of data warehousing, but I saw another way ETL pipelining could be used in a data warehouse. I would say this course has been among the most helpful courses on Pluralsight that I've taken for upskilling. This course strongly improved my understanding of surrogate keys and business keys in the data warehouse.
The surrogate key ProductSK acts like a foreign key, an individual identifier in a reference table. The business keys are keys from other tables such as Inventory, Purchasing, and WorkMgt. When other tables try to import data from this reference table, they would use a lookup function to search this reference table Product, for their business key. So if the Purchasing needed information about the product purchased by a particular entry, their record would have some column indicating that the product information can be found in the Product table with business key "Wid45968".
With this information, I can complete my paper-pencil model of a data pipeline by fully fleshing out some example tables with business and surrogate keys. If I'm able to do this, I will improve my data warehouse mental model for work.
SQL Server Practice
After last week, when I practiced making a CTE (common table expression), I wanted to also try to put a CTE inside a stored procedure. Did not work. CTEs can only be used with the statement directly after it has been created, so using it in a stored procedure is literally impossible. What I probably would have had to do was to create a temporary view/table that could be saved and used later when the procedure is called.
I also worked on another complex join in SQL Server, this time using all of the associated tables for an object rather than a few. I used the Human Resources objects to find out whether or not there was a pay disparity between the sexes. This was just another exercise to get used to the idea of business keys and surrogate keys used in a data warehouse.
Python ETL Pipeline
To find practice substitutes for ETL Pipelining outside of using Visual Studio and SQL Server, I found a Medium article explaining how to build a simple pipeline using Python. Conceptually, I understand how to build a pipeline; it's simple extraction from a source, transformation and modification, and then inserting it into a destination location for later use.
What I really would like to practice is creating complex pipelines in SQL or another data modeling environment, but learning how to code a pipeline in Python is equally as useful; I'll still get a chance to practice pipelining, even if it's a different language. After all, the data modeling tool is more like a GUI (graphical user interface) that visually represents what the code script. The idea here being that if I can code a pipeline in Python, I should be a quicker study when I start using the modeling interface.
In Other News...
A couple of weeks ago, I was asked to participate in a sort of blog experiment for the company. I could write a short article on any topic relating to data analytics. I wrote a short introductory article about basic skills data analysts use. It is currently live on the Talent Path blog, feel free to give it a read!
Comments