I have been mentoring aspiring data scientist for quite some time, the most frequently asked question is how do I transition from my existing Non – Data Scientist role to a successful data scientist.
To make a successful switch, one has to have a basic grounding in statistics such as Median Standard, Correlation and Linear Regression & Deviation before deep diving into modelling.
Often, statisticians and mathematicians with training data analysis techniques such as graphing, plotting, analysis and hypothesis find it easy to transition to data scientist role
Here’s how you can beef up on basic data analyst skills:
1. Learn Hypothesis generation and analysis through plots, graphs, and reasoning
2. Get up to speed with statistical reasoning
3. Data science is also about products, learn how to leverage data to figure out product features, enhancements
4. Data munging is the art of cleaning data. It is a time consuming job and entails dealing with missing data and changing schema. Before diving into large datasets, explore cleaning data
5. While Kaggle is hailed as a stepping stone for honing machine learning and data analysis skills, the competition features curated datasets that are anonymized and cleaned.
6. Understanding the business domain one wishes to work in. The data analysis problems are solved according to the business needs, hence domain understanding is a must
7. Data science is a long-learning process. Switching to data engineering and learning statistics on your own can be one learning path towards a deeper learning experience
Switching from Software to Data Analytics is definitely a tricky job
Sharpening business domain understanding is very essential
For e.g. let us say an analyst is asked to reduce the churn of customers from a retail bank. In that case, the analyst would need to know the types of products that exist within a bank and how customers engage with those products. Emphasizes it’s always important to understand what problem you’re trying to solve.
Analytics techniques are a means to an end. And understanding the most important challenges for a business is vital to selecting appropriate techniques.
For example, a missed prediction of fraud is significantly more expensive to the financial firms than a corresponding wrong prediction of email open rates is to a marketing department.
Understanding this will help you make suitable trade-offs between complexity, effectiveness and cost of implementing various analytical techniques. It’s not always a question of picking the most cutting edge solution.
Good decision making calls for picking the right tool/technique for the job.
Shoring up Mathematical/Statistical knowledge
When it comes to statistical knowledge, I believes statistics, probability and mathematics in general is often the most daunting prospect for people entering data science and analytics as a field.
The amount of statistical knowledge needed to do effective analysis doesn’t take an advanced degree to master. Linear algebra, metrices, statistical tests, distributions, likelihood estimators, regression, the Bayes theorem and conditional probability are all you need to get started.
There isn’t a whole lot of merit in learning very advanced statistics until you start working with lots of data, and hit a ceiling in terms of efficacy of your models.
Mathematics/Statistics power all the algorithms which are used to quantify the impact of variables under analysis; they identify hidden patterns and also make predictions or recommendations.
In the customer churn case, the analyst may use logistic regression to predict which customers are likely to churn by statistically quantifying the impact of various factors (such as account balance, number of credit cards etc.) That have led to a churn of customers in the past.
Building up technical know-how
Since organizations are dealing with millions of data points, they need technological solutions which can help them apply the algorithms at scale. Here, tools like R, Python etc., become extremely important. The analyst in this case would use tools like R to apply logistic regression to all the customers in the bank’s database so as to identify potential churners
I believe competitions like Kaggle offer a perfect platform for getting started with working on huge datasets.
Besides, Capstone case studies help in tackling real world problems.
1. Focus on how to make the pivot to analytics without losing on the wealth of experience you have
2. Show your depth by writing blogs/articles on your own site, as well as on forums like LinkedIn etc
3. Work on projects – The course should have a capstone project with industry.
You may also reference sites like Kaggle
One must show that they are competent in analytics, not just tell. “When competing with thousands of people who all claim to know the same things you do, it is hard to distinguish yourself if all you have to say is “I know x, y and z”.
You need to be able to show that you have learned to work with data, written code, cleaned and processed datasets and tuned models to improve their effectiveness,”
Kaggle challenges are a great place to build this portfolio of work, as are HackerEarth, Analytics Vidhya and DataCamp challenges.
Even if you’re not working on these formal challenges, you’d be well served to upload your dataset, code snippets and model outputs (as well as brief description of what you did and why) on Github. More often than not, smart data science teams ask for your Github account to evaluate your proficiency.
Finally, it may require patience, networking and showcasing cases/data sets that one has worked on to help convince people of your depth and strength in analytics,
Below is my advice on how to improve on networking opportunities
• The course you decided to perceive should have industry experts and networking opportunities
• Change your CV and cover letter to show your analytics profile
• Leverage and grow network in analytics by showcasing your knowledge
Career transitions aren’t easy.
Remember, you don’t have to unlearn everything, “To make any successful transition, you’re likely to succeed if you build on your existing knowledge. So, if you’re proficient as a programmer, then transitioning into data engineering roles allows you to use your proficiency rather than start from scratch.
Similarly, if you’re proficient in databases and data warehouses, then you’re well positioned to move into data architecture roles”.
We give you a recap of technical and non-technical skills to beef up on to make a head-start in the data-intensive field. Dealing with unstructured data, familiarity with Hadoop platform, acing modelling language such as R, Python and querying languages such as Pig, Hive and SQL and lastly statistics. Communication skills and an innate curiosity will go a long way in optimizing products and services.