Must Know Questions for Data Science and ML Interviews

Computer scientist and ML expert Santiago Valdarrama (@svpino on Twitter) recently tweeted a list of 20 fundamental questions that you need to ace before getting a Machine Learning job. Claiming:

“Almost every company will ask these to weed out non-prepared candidates. You don’t want to show up unless you are comfortable having a discussion about all of these.”

Image for post
Image for post
Santiago Valdarrama — @svpino

1. Explain the difference between Supervised and Unsupervised methods.

  • Example: If we’re building a classifier to tell if an animal is a cat or a dog, we would train the model on a dataset of dog and cat images correctly tagged as such. Then we can get predictions on new unlabeled images! Supervised learning allows us to collect data or produce a data output from the previous experience.


The Problems with Handwriting OCRs and Our Solution

Image for post
Image for post
www.storysquad.education

StorySquad is a “gaming” platform that turns your child into a reader, writer, and illustrator. It encourages creation and is an alternative to Fortnite and YouTube that children enjoy and parents trust. Each week we send our users a short story to read. Then kids write their own story based on the characters they just read about and draw an illustration (on actual paper to reduce screen time and encourage sustained focus). …


Filtering spam with Multinomial Naive Bayes (From Scratch)

In the first half of 2020 more than 50% of all email traffic on the planet was spam. Spammers typically receive 1 reply for every 12,500,000 emails sent which doesn’t sound like much until you realize more than 15 billion spam emails are being sent each and every day. Spam is costing businesses 20–200 billion dollars per year and that number is only expected to grow.

Image for post
Image for post

What can we do to save ourselves from spam???

Naive Bayes Classifiers

In probability theory and statistics, Bayes’ theorem (alternatively Bayes’s theorem, Bayes’s law or Bayes’s rule) describes the probability of an event, based on prior knowledge of conditions that might be related to the event. …


Dengue viruses are spread to people through the bite of an infected Aedes species (Ae. aegypti or Ae. albopictus) mosquito. Dengue is common in more than 100 countries around the world. Forty percent of the world’s population, about 3 billion people, live in areas with a risk of dengue. Dengue is often a leading cause of illness in these areas.

Image for post
Image for post

I’ll be using data from San Juan, Puerto Rico and Iquitos, Peru to predict the total cases of dengue fever infections for each week. Let’s start out by looking at the total cases of dengue plotted against a time series.


Image for post
Image for post

Research Question:

Why has there been an increase in traffic fatalities over the last 10 years while the rate of seat belt use has gone up and car safety features have improved? Why does it seem that driving is safer than it was 10 years ago but your chances of dying are higher? Is it us?

About

Jack Ross

Lambda Endorsed Data Scientist, snowboarder, coffee lover, and useless robot designer.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store