Data science career paths have seen a big boost recently. If you’ve heard of DALL-E, GPT-3, LaMDA, AlphaFold, Copilot, Gopher, etc, you know that data & AI is gaining incredible momentum. But, in the grand scheme of things, it is still in its infancy. Just 10 years ago, you couldn’t even talk about data science career paths – ‘Data Scientist’ was an obscure job title and educational degrees were few and far in between.

We’ve come a long way in a short time. Data science careers are now some of the hottest in the market. From innovative startups that are reimagining the world to massive enterprises that maintain it structured – data is everywhere and organisations know that converting data into insight means power.

This increased demand for people, compounded by the increasing breadth of applications across industries, leads to an interesting phenomenon: everyone has a different opinion about what a data science career means. We’ve launched this resource to shine a light on the exciting world of data science and explore the the nature & evolution of roles in the space. This is a living document that we will keep updating as things change in industry.

From studying large scale projects and conducting interviews, we have identified 3 core roles. We have also developed a skills framework to categorise and group the main skills. We then use the skills framework to dissect the core roles and explore two other, rapidly emerging ones.

Core Data Science Roles

Data engineering, data analysis and data science are the three core data roles. They all imply different responsibilities and skills, and each of them plays a distinct role in shaping how data is used within an organization.

A data engineer works continuously on the backend to improve data pipelines and ensure that the data the organization relies on is accurate and available. Data engineers use a range of tools to ensure the data is processed correctly and that the right data is available to anyone who needs it.

A data analyst extracts a new data set using a custom API, built by the data engineer, and identifies trends in the data. Analysts summarize their results, often using visualization methods, and communicate them to teammates and other stakeholders to support decision making.

Finally, the data scientist is responsible for extracting insights from data in response to problems. They often work with analysts’ and to build on their initial findings. Whether by creating algorithms, training machine learning models, or running advanced statistical analyses, the data scientist turns raw data into meaningful information to improve processes and decisions.

Data Science Skills Framework

The skills required for each role vary depending on the company and the industry. Our skills framework focuses on five categories of skills that are required across the data lifecycle. The framework is designed to be industry-agnostic. We use it to highlight the difference between roles by mapping the skills and the extent to which they are used as a % of time.

A mind map of skills and categories in data science

Skills Categories

Programming

Most data science jobs require some level of programming expertise in either a programming language (e.g. Python, R, Javascript) or a query language (e.g. SQL)

Data Extraction & Wrangling

A big part of data science work is to find data that can help you solve your problem. Data is rarely clean and formatted for use in the “real world”. It’s vital that you spot any error in your data before investing too much time in analysis.

Exploratory Data Analysis & Storytelling

Drawing insights from data and communicating it to stakeholders – often visually and in simple terms – is a core competency for any data scientist.

Statistics & Mathematics

Statistical and mathematical methods are a central part of data science. Why? Because it will be difficult to build algorithms, perform analysis, and uncover insights, unless you have a sold grasp of things like linear algebra, calculus, and probability.

Machine Learning

Machine learning is a subset of artificial intelligence. With ML you leverage data to train a model to perform some set of tasks (e.g. classification, prediction). Due to its meteoric rise, we have included it as a standalone category.

#1 Data Analyst

Data analysts are responsible for collecting, processing, interpreting, and performing statistical data. They primarily use programming languages and frameworks to review data and make inferences. Then, they present results for management teams to make better decisions.

data science career paths: #1 data analyst skills map

Main Skills

Data Extraction and Wrangling -⌛50% of overall time
  • Handling Missing Values
  • Data Transformation – Joining, slicing, indexing
  • Pandas
  • NumPy
  • Perform Data Cleansing
  • Use Data Processing Techniques
Exploratory Data Analysis and Storytelling -⏳25% of overall time
  • Studying Data Distribution
  • Univariate and Multivariate Analysis
  • Data Visualization (Matplotlib, Seaborn, Plotly)
  • Building Dashboards (Excel, Tableau, Jupyter…)
  • Thinking Analytically
  • Report Analysis Results
Statistics and Mathematics -⏳15% of overall time
  • Execute Analytical Mathematical Calculations
  • Descriptive – Mean, Median, Mode, Std
  • Inferential – Hypothesis, A/B Testing, CI, P Value
  • Probability – Conditional, Bayes Theorem
  • Sampling, Data Distributions, T-tests
  • Linear Algebra
  • Single and Multivariate Calculus

#2 Data Engineer

Data engineers develop and maintain the data infrastructure, which determines how a company collects and stores data. They build data pipelines that transform raw, unstructured data into usable formats that data scientists and analysts can use.

data science career paths: #2 data engineer skills map

Main Skills

Software Engineering -⌛70% of overall time
  • Develop Data Processing Applications
  • Data Structures
  • Conditional
  • List/Dict Comprehensions
  • Scripting
  • Python
Data Extraction and Wrangling -⌛30% of overall time
  • Handling Missing Values
  • Data Transformation – Joining, slicing, indexing
  • Perform Data Cleansing
  • Use Data Processing Techniques

#3 Data Scientist

Data scientists perform analytics and build machine learning models. Their tasks help companies develop new business strategies and determine long-term goals. Data scientists also build internal data products, which can help a company better understand its workforce, processes, and customers.

data science career paths: #3 data scientist skills map

Main Skills

Software Engineering -⌛10% of overall time
  • Data Structures
  • Scripting
  • Python
  • R
Data Extraction and Wrangling -⌛20% of overall time
  • Handling Missing Values
  • Data Transformation – Joining, slicing, indexing
  • Pandas
  • NumPy
  • Perform Data Cleansing
Exploratory Data Analysis and Storytelling -⏳10% of overall time
  • Studying Data Distribution
  • Univariate and Multivariate Analysis
  • Data Visualization (Matplotlib, Seaborn, Plotly)
  • Thinking Analytically
  • Report Analysis Results
Statistics and Mathematics -⏳30% of overall time
  • Execute Analytical Mathematical Calculations
  • Descriptive – Mean, Median, Mode, Std
  • Inferential – Hypothesis, A/B Testing, CI, P Value
  • Experiment Design
  • Probability – Conditional, Bayes Theorem
  • Sampling, Data Distributions, T-tests
  • Linear Algebra
  • Single and Multivariate Calculus
Machine Learning -⏳30% of overall time
  • Supervised – Classification, Regression, Neural Network
  • Unsupervised – Clustering, Dimensionality Reduction
  • Reinforcement Learning – TF-agents, Optimising Rewards
  • Performance Metrics – RMS, Accuracy, Confusion Matrix, AUC-ROC, etc
  • Hyperparameter Tuning
  • Statistical ML – KNN, Decision Trees, Bagging, Boosting
  • Ensemble Models – Random Forests, Voting Classifiers, Adaboost

#4 Machine Learning Engineer

Machine learning engineers work with vast quantities of data and perform complex data modelling. They design self-running software that uses previous data to improve the program’s functionality. Machine learning engineers also perform machine learning tests, check data quality, and collaborate with other members of a data science team, such as data scientists, data analysts and administrators.

data science career paths: #4 machine learning engineer skills map

Main Skills

Software Engineering -⌛40% of overall time
  • Fundamental Algorithms
  • Data Structures
  • Conditional
  • List/Dict Comprehensions
  • Scripting
  • Develop Data Processing Applications
  • Python
Data Extraction and Wrangling -⌛20% of overall time
  • Handling Missing Values
  • Data Transformation – Joining, slicing, indexing
  • Pandas
  • NumPy
  • Perform Data Cleansing
  • Use Data Processing Techniques
Machine Learning -⏳40% of overall time
  • Supervised – Classification, Regression, Neural Network
  • Unsupervised – Clustering, Dimensionality Reduction
  • Reinforcement Learning – TF-agents, Optimising Rewards
  • Performance Metrics – RMS, Accuracy, Confusion Matrix, AUC-ROC, etc
  • Hyperparameter Tuning
  • Statistical ML – KNN, Decision Trees, Bagging, Boosting
  • Ensemble Models – Random Forests, Voting Classifiers, Adaboost

#5 Research Scientist, ML

A newly formalized role, a research scientist in machine learning is responsible for researching and developing new methods, algorithms, and approaches to data science. ML scientists are generally a part of the Research and Development (R&D) division in any organization. They are in charge of finding innovative data processing and analysis approaches, often leading to published work.

data science career paths: #5 machine learning scientist skills map

Main Skills

Data Extraction and Wrangling -⌛20% of overall time
  • Handling Missing Values
  • Data Transformation – Joining, slicing, indexing
  • Pandas
  • NumPy
  • Perform Data Cleansing
  • Use Data Processing Techniques
Exploratory Data Analysis and Storytelling -⏳10% of overall time
  • Studying Data Distribution
  • Univariate and Multivariate Analysis
  • Data Visualization (Matplotlib, Seaborn, Plotly)
  • Thinking Analytically
  • Report Analysis Results
Statistics and Mathematics -⏳30% of overall time
  • Execute Analytical Mathematical Calculations
  • Descriptive – Mean, Median, Mode, Std
  • Inferential – Hypothesis, A/B Testing, CI, P Value
  • Experiment Design
  • Probability – Conditional, Bayes Theorem
  • Sampling, Data Distributions, T-tests
  • Linear Algebra
  • Single and Multivariate Calculus
Machine Learning -⏳40% of overall time
  • Supervised – Classification, Regression, Neural Network
  • Unsupervised – Clustering, Dimensionality Reduction
  • Reinforcement Learning – TF-agents, Optimising Rewards
  • Performance Metrics – RMS, Accuracy, Confusion Matrix, AUC-ROC, etc
  • Hyperparameter Tuning
  • Statistical ML – KNN, Decision Trees, Bagging, Boosting
  • Ensemble Models – Random Forests, Voting Classifiers, Adaboost

As industry and our understanding of it evolves, we will update roles, add more, and get more granular with skills. If you found this guide useful, add it to your bookmarks and refer back to it when needed.