This study of in-demand skills for 2021, based on over 15,000 Data Scientist work listings, should give you a good idea of rising programming languages and software tools are growing in importance and which are decreasing in importance if you are preparing to make a career as a Data Scientist or searching for opportunities to skill-up in your current profession.
To begin, I’d like to acknowledge that this is heavily influenced by Jeff Hale’s articles from 2018/2019. I’m writing this simply to get a more up-to-date analysis of what skills are in demand today, and I’m sharing it because I’m hoping that everyone would like to see an updated version of the most in-demand skills for data scientists in 2021 as well.
Take what you want from this report, but it is clear that the data gathered from web scraping job listings does not provide a perfect connection to the most in-demand data science skills. However, I believe this provides a clear indicator of which general skills you should concentrate on and which you should avoid.
With that said, I hope you enjoy it, and now let’s get started!
I scraped and compiled over 15,000 work listings from Indeed, Monster, and SimplyHired for this study. I didn’t scrape LinkedIn because I couldn’t scrape it due to Captcha problems.
After that, I looked to see how many work listings contained per word I was looking for. The words I was looking for were as follows (if you want to see other skills, please let me know in the comments so I can include them in next year’s analysis!):
- Scikit-learn, Pandas, NumPy, SciPy
- Matplotlib, Looker, Tableau
- TensorFlow, PyTorch, Keras
- Spark, Hadoop, AWS, GCP, Hive, Azure, Google Cloud, MongoDB, BigQuery
- Docker, Kubernetes, Airflow
- NoSQL, MySQL, PostgreSQL
- Caffe, Alteryx, Perl, Cassandra, Linux
I added up the counts from each source and divided them by the total number of data scientist work postings to arrive at a percentage. Python, for example, has a value of 0.77, indicating that Python was used in 77% of work postings.
Finally, I calculated the percentage shift from 2019 to 2021 by comparing the findings to Jeff Hale’s report from 2019.
From highest to lowest, here are the top 25 most in-demand data science skills in 2021:
Top Programming Languages
The chart below displays the top programming languages for data scientists at a more granular level:
Python, SQL, and R are the top three programming languages, which comes as no surprise.
Personally, I believe that you should be familiar with either Python or R, as well as SQL. Python is where I started, and it’s where I’ll probably stay for the rest of my life. It’s way ahead of the game in terms of open source contributions, and it’s simple to pick up. SQL is arguably the most valuable skill to learn in any data-related field, including data scientists, data engineers, data analysts, business analysts, and so on.
Top Python Libraries
Similarly, the following graph depicts the most common Python libraries for data scientists:
TensorFlow comes in the first place because it is one of the most common Python deep learning libraries. PyTorch is a good alternative, as shown by its rating.
Scikit-learn is perhaps the most powerful machine learning library in Python. Scikit-learn is used to create machine learning models after cleaning and manipulating the data with Pandas and/or NumPy. It has a lot of tools for predictive modelling and analysis.
Despite their representation above, Pandas, NumPy, and SciPy, in my view, are also important for data scientists.
Skills with the Fastest Growth and the Slowest Decline
From 2019 to 2021, the graphs below display the fastest rising and decreasing skills:
Here are some key takeaways from the two graphs:
- There has been a significant rise in cloud-related skills, such as AWS and GCP.
- Similarly, skills related to deep learning, such as PyTorch and TensorFlow, have seen a significant rise.
- The value of SQL and Python is growing, while R is staying the same.
- Apache products such as Hadoop, Hive, and Spark are losing popularity.