Data Engineering Onboarding

Friday, January 01, 2021

2 minute read

As there are people from many backgrounds considering taking on the role of “data engineer” or “data scientist”, I have put together a list of resources that I think are useful for getting started in data engineering. This list is not exhaustive, but it should give you a good start.

This article will fit somewhere into the onboarding process.

Personas

T-SQL Expert

Databricks Samples

This is probably for a different article, but there could be an exercise on registering these datasets into the Unity Catalog. Better yet, an article could be written for the end-to-end process for serving dashboard, feature stores, and models. At least let me write a function that walks the samples directory and collects and appends all of the readme files into a single markdown file. This would take about half a day the first time and would make an excellent article.

Get the list of files-
Handle the files in a loop using markdown syntax.
Bonus: use pypandoc to create latex and pdf versions of the markdown file.

An article could be written on how to download these to be used on other platforms including local platforms.

An article, possibly for today is create an appendix for chapter 3 (or all chapters) of Scott Haines’s book on implementing the same work except doing it on Windows. This could be a good exercise for people who are not familiar with Linux.

Another article on how to set up DevonThink so that it communicates well with Things and Bear (Apple Notes?). This could be a good exercise for people who are not familiar with DevonThink… and it could be a good exercise for people who are not familiar with Things and Bear.

The concept of back linking and forward linking between cloud domains would be an overarching objective. More links is better. Multidirectional linking is best.