Deep Dive #3: TDSP

This is the third of a 3-part series on Deep Dives of Data Science Methodologies.

We already covered the data science process in detail, so let’s talk about data team roles. The Team Data Science Process (TDSP) is a methodology created by Microsoft. It’s based on CRISP-DM, with an additional focus on team responsibilities. Let’s take a closer look.

Data science team roles fall within three categories:

  1. Mathematics/Statistics
  2. Computer Science
  3. Business Domain Knowledge

Every data science project should contain a good balance between these domains. Projects will usually start out from a business need, mathematicians will create a model and computer engineers will build the model, deploy it and feed the results back to the business experts.

Let’s define a few roles and responsibilities:

Data Engineer:

According to Saurav Dhungana, data engineers are often the first people to start the project. They are “generally someone with good programming and hardware skills, and can build your data infrastructure.” Once this infrastructure is built, raw data will flow to the data analysts.

Data Analyst:

Data analysts are mostly responsible for the technical analysis of data sets. They will be working hands-on with data to collect, process and transform it into usable records. Data analysts will then use a combination of programming, statistics and machine learning to derive actionable insights.

Sara Metwalli adds: “they are often in charge of preparing the data for communication with the project's business side by preparing reports”. This demonstrates that different roles must be comfortable across disciplines, regardless of their specialization.

Business Analyst:

Business analysts will study the results from data analysts and recommend the best decisions to maximize business targets. They serve as a bridge between the engineers and stakeholders and are important communicators during a project.

Database Administrator:

These are engineers responsible for maintaining, securing, and providing access to databases. Responsibilities include creating backups and recovery tools, security and authentication, troubleshooting and maintenance. The number of database administrators will grow as the company scales, or may even be outsourced by a database provider.

Data Scientist:

This is the broadest role and encompasses the entire spectrum of a data project. Data scientists have a wide understanding of every aspect of the job, and they are often tasked with leading a team. Saurav Dhungana adds that they will collaborate with domain experts to deliver results to stakeholders, which is the ultimate goal of a data project.

Machine Learning Engineer:

ML engineers will execute machine learning algorithms such as classification, regression and clustering. The main difference between these techniques compared to data analysts is that ML algorithms allow the computer to learn over time, thus improving its performance. Sara Metwalli continues: “machine learning engineers need to have strong statistics and programming skills in addition to some knowledge of the fundamentals of software engineering”.

These are some of the most common roles working on data projects. As Jyosmitha Munnangi notes in her article, some companies might require you to have expertise in all domains. This is particularly true for startups with fewer resources and smaller teams. As companies expand, teams can grow to include more diverse skillsets for specific problems.

So if you’re wondering which type of role you should focus on, we recommend the following: specialize in something you’re naturally good at and acquire a general knowledge of other roles. This will make you invaluable as an employee at both startups and large organizations!


To learn more about TDSP and different roles in data teams, check out our amazing sources and their profiles!

Saurav Dhungana: On Building Effective Data Science Teams

Sara Metwalli: 10 Different Data Science Job Titles and What They Mean

Jyosmitha Munnangi: What are 12 different job roles & responsibilities in Data Science?

Back to Blog