Data Science Career Paths

Data science career paths have seen a big boost recently. If you’ve heard of DALL-E, GPT-3, LaMDA, AlphaFold, Copilot, Gopher, etc, you know that data & AI is gaining incredible momentum. But, in the grand scheme of things, it is still in its infancy. Just 10 years ago, you couldn’t even talk about data science career paths – ‘Data Scientist’ was an obscure job title and educational degrees were few and far in between.

We’ve come a long way in a short time. Data science careers are now some of the hottest in the market. From innovative startups that are reimagining the world to massive enterprises that maintain it structured – data is everywhere and organisations know that converting data into insight means power.

This increased demand for people, compounded by the increasing breadth of applications across industries, leads to an interesting phenomenon: everyone has a different opinion about what a data science career means. We’ve launched this resource to shine a light on the exciting world of data science and explore the the nature & evolution of roles in the space. This is a living document that we will keep updating as things change in industry.

From studying large scale projects and conducting interviews, we have identified 3 core roles. We have also developed a skills framework to categorise and group the main skills. We then use the skills framework to dissect the core roles and explore two other, rapidly emerging ones.

Core Data Science Roles

Data engineering, data analysis and data science are the three core data roles. They all imply different responsibilities and skills, and each of them plays a distinct role in shaping how data is used within an organization.

A data engineer works continuously on the backend to improve data pipelines and ensure that the data the organization relies on is accurate and available. Data engineers use a range of tools to ensure the data is processed correctly and that the right data is available to anyone who needs it.

A data analyst extracts a new data set using a custom API, built by the data engineer, and identifies trends in the data. Analysts summarize their results, often using visualization methods, and communicate them to teammates and other stakeholders to support decision making.

Finally, the data scientist is responsible for extracting insights from data in response to problems. They often work with analysts’ and to build on their initial findings. Whether by creating algorithms, training machine learning models, or running advanced statistical analyses, the data scientist turns raw data into meaningful information to improve processes and decisions.

Data Science Skills Framework

The skills required for each role vary depending on the company and the industry. Our skills framework focuses on five categories of skills that are required across the data lifecycle. The framework is designed to be industry-agnostic. We use it to highlight the difference between roles by mapping the skills and the extent to which they are used as a % of time.

Skills Categories

Programming

Most data science jobs require some level of programming expertise in either a programming language (e.g. Python, R, Javascript) or a query language (e.g. SQL)

Data Extraction & Wrangling

A big part of data science work is to find data that can help you solve your problem. Data is rarely clean and formatted for use in the “real world”. It’s vital that you spot any error in your data before investing too much time in analysis.

Exploratory Data Analysis & Storytelling

Drawing insights from data and communicating it to stakeholders – often visually and in simple terms – is a core competency for any data scientist.

Statistics & Mathematics

Statistical and mathematical methods are a central part of data science. Why? Because it will be difficult to build algorithms, perform analysis, and uncover insights, unless you have a sold grasp of things like linear algebra, calculus, and probability.

Machine Learning

Machine learning is a subset of artificial intelligence. With ML you leverage data to train a model to perform some set of tasks (e.g. classification, prediction). Due to its meteoric rise, we have included it as a standalone category.

#1 Data Analyst

Data analysts are responsible for collecting, processing, interpreting, and performing statistical data. They primarily use programming languages and frameworks to review data and make inferences. Then, they present results for management teams to make better decisions.

Main Skills

Software Engineering -⌛10% of overall time

Scripting
Python
R

Data Extraction and Wrangling -⌛50% of overall time

Handling Missing Values
Data Transformation – Joining, slicing, indexing
Pandas
NumPy
Perform Data Cleansing
Use Data Processing Techniques

Exploratory Data Analysis and Storytelling -⏳25% of overall time

Studying Data Distribution
Univariate and Multivariate Analysis
Data Visualization (Matplotlib, Seaborn, Plotly)
Building Dashboards (Excel, Tableau, Jupyter…)
Thinking Analytically
Report Analysis Results

Statistics and Mathematics -⏳15% of overall time

Execute Analytical Mathematical Calculations
Descriptive – Mean, Median, Mode, Std
Inferential – Hypothesis, A/B Testing, CI, P Value
Probability – Conditional, Bayes Theorem
Sampling, Data Distributions, T-tests
Linear Algebra
Single and Multivariate Calculus

#2 Data Engineer

Data engineers develop and maintain the data infrastructure, which determines how a company collects and stores data. They build data pipelines that transform raw, unstructured data into usable formats that data scientists and analysts can use.

Main Skills

Software Engineering -⌛70% of overall time

Develop Data Processing Applications
Data Structures
Conditional
List/Dict Comprehensions
Scripting
Python

Data Extraction and Wrangling -⌛30% of overall time

Handling Missing Values
Data Transformation – Joining, slicing, indexing
Perform Data Cleansing
Use Data Processing Techniques

#3 Data Scientist

Data scientists perform analytics and build machine learning models. Their tasks help companies develop new business strategies and determine long-term goals. Data scientists also build internal data products, which can help a company better understand its workforce, processes, and customers.

Main Skills

Software Engineering -⌛10% of overall time

Data Structures
Scripting
Python
R

Data Extraction and Wrangling -⌛20% of overall time

Handling Missing Values
Data Transformation – Joining, slicing, indexing
Pandas
NumPy
Perform Data Cleansing

Exploratory Data Analysis and Storytelling -⏳10% of overall time

Studying Data Distribution
Univariate and Multivariate Analysis
Data Visualization (Matplotlib, Seaborn, Plotly)
Thinking Analytically
Report Analysis Results

Statistics and Mathematics -⏳30% of overall time

Execute Analytical Mathematical Calculations
Descriptive – Mean, Median, Mode, Std
Inferential – Hypothesis, A/B Testing, CI, P Value
Experiment Design
Probability – Conditional, Bayes Theorem
Sampling, Data Distributions, T-tests
Linear Algebra
Single and Multivariate Calculus

Machine Learning -⏳30% of overall time

Supervised – Classification, Regression, Neural Network
Unsupervised – Clustering, Dimensionality Reduction
Reinforcement Learning – TF-agents, Optimising Rewards
Performance Metrics – RMS, Accuracy, Confusion Matrix, AUC-ROC, etc
Hyperparameter Tuning
Statistical ML – KNN, Decision Trees, Bagging, Boosting
Ensemble Models – Random Forests, Voting Classifiers, Adaboost

#4 Machine Learning Engineer

Machine learning engineers work with vast quantities of data and perform complex data modelling. They design self-running software that uses previous data to improve the program’s functionality. Machine learning engineers also perform machine learning tests, check data quality, and collaborate with other members of a data science team, such as data scientists, data analysts and administrators.

Main Skills

Software Engineering -⌛40% of overall time

Fundamental Algorithms
Data Structures
Conditional
List/Dict Comprehensions
Scripting
Develop Data Processing Applications
Python

Data Extraction and Wrangling -⌛20% of overall time

Handling Missing Values
Data Transformation – Joining, slicing, indexing
Pandas
NumPy
Perform Data Cleansing
Use Data Processing Techniques

Machine Learning -⏳40% of overall time

Supervised – Classification, Regression, Neural Network
Unsupervised – Clustering, Dimensionality Reduction
Reinforcement Learning – TF-agents, Optimising Rewards
Performance Metrics – RMS, Accuracy, Confusion Matrix, AUC-ROC, etc
Hyperparameter Tuning
Statistical ML – KNN, Decision Trees, Bagging, Boosting
Ensemble Models – Random Forests, Voting Classifiers, Adaboost

#5 Research Scientist, ML

A newly formalized role, a research scientist in machine learning is responsible for researching and developing new methods, algorithms, and approaches to data science. ML scientists are generally a part of the Research and Development (R&D) division in any organization. They are in charge of finding innovative data processing and analysis approaches, often leading to published work.

data science career paths: #5 machine learning scientist skills map

Main Skills

Data Extraction and Wrangling -⌛20% of overall time

Handling Missing Values
Data Transformation – Joining, slicing, indexing
Pandas
NumPy
Perform Data Cleansing
Use Data Processing Techniques

Exploratory Data Analysis and Storytelling -⏳10% of overall time

Studying Data Distribution
Univariate and Multivariate Analysis
Data Visualization (Matplotlib, Seaborn, Plotly)
Thinking Analytically
Report Analysis Results

Statistics and Mathematics -⏳30% of overall time

Execute Analytical Mathematical Calculations
Descriptive – Mean, Median, Mode, Std
Inferential – Hypothesis, A/B Testing, CI, P Value
Experiment Design
Probability – Conditional, Bayes Theorem
Sampling, Data Distributions, T-tests
Linear Algebra
Single and Multivariate Calculus

Machine Learning -⏳40% of overall time

Supervised – Classification, Regression, Neural Network
Unsupervised – Clustering, Dimensionality Reduction
Reinforcement Learning – TF-agents, Optimising Rewards
Performance Metrics – RMS, Accuracy, Confusion Matrix, AUC-ROC, etc
Hyperparameter Tuning
Statistical ML – KNN, Decision Trees, Bagging, Boosting
Ensemble Models – Random Forests, Voting Classifiers, Adaboost

As industry and our understanding of it evolves, we will update roles, add more, and get more granular with skills. If you found this guide useful, add it to your bookmarks and refer back to it when needed.

Data Science Career Paths

Core Data Science Roles

Data Science Skills Framework

Skills Categories

Programming

Data Extraction & Wrangling

Exploratory Data Analysis & Storytelling

Statistics & Mathematics

Machine Learning

#1 Data Analyst

Main Skills

#2 Data Engineer

Main Skills

#3 Data Scientist

Main Skills

#4 Machine Learning Engineer

Main Skills

#5 Research Scientist, ML

Main Skills

Legal

Partnerships

Get in Touch

Partnerships

Legal

Get in Touch