Why you need a data engineer
Intro
Imagine if every painter had to make their own brushes or mix their own pigments and resin into paint. How many people would still paint? My guess: not many, and those who did would have trouble creating great works of art. Painting is a more accessible craft today because artists can easily get the tools and materials they need off the shelf.
In the data world, many analysts and data professionals are still expected to create their own work from raw materials. They might not realize that what they need is a data engineer.
Data artists and database miners
Before understanding what a data engineer does and why they are necessary, it’s important to know how data teams are typically organized in large organizations (especially in government). Generally, there are two main groups: Database Administrators (DBAs) and Analysts.
Analysts
An analyst’s role is similar to an artist’s. Just as an artist starts with supplies like canvas, paint, and brushes, analysts begin with information and are tasked with turning it into something valuable. Artists create paintings; analysts create insights. The term “analyst” can mean many different things. For instance, in the City of San Francisco alone, there are 37 different roles with “analyst” in the title, covering 2,339 employees. These include Benefits Analysts, Investment Analysts, Administrative Analysts, and even 18 Aviation Security Analysts. Despite their varied titles, all analysts produce reports, forecasts, and recommendations based on the information they receive.
Database Administrators
The other major group of data professionals in large organizations is database administrators (DBAs). If analysts are painters, DBAs are like those who mine minerals for pigments and manage the warehouses in which they are stored. DBAs are responsible for collecting, storing, and processing raw data, while also managing capacity, access and security.
Painting a picture with data
There’s often a significant gap between what DBAs do and what analysts need. DBAs excel at collecting and storing data efficiently, ensuring it powers applications seamlessly. They focus on managing vast amounts of data securely and reliably. However, their role isn’t to format or structure the data in a way that’s immediately useful for analysts.
The handoff between DBA’s and analysts is like a miner handing an artist a chunk of raw ore and expecting them to crush, grind, wash, and mix it into a paint . While DBAs provide the essential raw materials, analysts need a more refined product to work efficiently. Specifically, they need easy access to clean, well structured and documented information — usually in the form of a dataset.
When this doesn’t happen, analysts must invest additional time and effort to prepare the data before they can extract meaningful insights from it. Unfortunately, this approach is inefficient. Even if analysts are competent in data wrangling and cleaning — they don’t necessarily have the tools or infrastructure to do this in a scalable, secure or automated way. Plus, if you have multiple analysts on a team — it doesn’t make sense for each of them to individually clean up data. It should only have to be done once, and then distributed to all end users — enter the data engineer.
What does a data engineer do?
A data engineer serves as a bridge between the DBA and the analyst. If analysts are painters and DBAs manage raw materials, the data engineer’s role is to transport and process those materials, turning them into the finished products analysts need.
Practically speaking, a data engineer should be an expert in extracting data from source systems, transforming it into analysis-ready tables, and loading it into the analyst’s tools of choice. This process is often referred to as ETL (Extract, Transform, Load).
A good data engineer understands source systems (though not as deeply as a DBA) and also grasps the business processes involved with the data (though not as intimately as an analyst).
Signs you need a data engineer
So, how do you know if you or your department needs a data engineer? Everyone should assess the need on a case by case basis, but there are a few telltale signs:
Analysts doing repetitive work: If analysts have to produce the same report every week, especially if it takes a long time, this process should be automated by a data engineer
DBAs or administrators are swamped by data requests: If people on your team spend a meaningful amount of time pulling and fulfilling data requests, a self-service data repository or warehouse should be built by a data engineer
Most of the work is data cleaning instead of analysis: If most of your analyst’s time isn’t spent on actual analysis, but cleaning or debugging data, a data engineer should build data models for your department which automatically clean and debug the data
Required data comes from a lot of sources: If the data you need comes from disparate sources and need to be joined and integrated together, this process should be automated by a data engineer
The best painters mix their own paints
Having a data engineer is just the beginning of the process for effective data use in reporting and analysis. Over time, data engineering best practices will spread to both DBAs and analysts. While setting up and managing extract and load processes requires technical expertise, many transformation tools have been designed to be more accessible to analysts.
Initially, the data engineer will handle most of the data modeling, but eventually, analysts might want to try it themselves. They will realize that not only can they access clean data, but they can also decide how the tables should be structured. Similarly, DBAs might start to consider how data will be used for analysis and how raw data could be more effectively distributed and organized for analytical purposes.
At that point, the data engineer can tip their hat and put their feet up (or, more likely, tackle another data issue somewhere else in the organization).
Learn more about DataSF
Visit sf.gov/data
View the SF Open Data Portal
Sign up for New Dataset Alerts



