A decade ago, there was an extreme push to earn your degree from an accredited college and land a job within the field you studied in. I was a product of that mentality myself. I went to a private school for four years, earned a Bachelor’s of Science Degree in Sports Management. I didn’t end up using that degree for sports specifically, though I gained useful information about the buisness. I ended up finding myself within a few sales positions before getting let go due to the pandemic. I’m not using this as a platform for sympathy, I just wanted to express my opinion that a four-year college education isn’t the ‘best’ way to become a successful individual in this world. There are many, many other routes to take. For instance, while I was sitting around the house, jobless, quarantining; I found myself constantly wasting time scrolling through my phone on social media allowing time to pass by. As I laid around feeling bad about myself and pretty lost on what to do next (I thought I had my career path set with the previous job), I came across this mentor platform Career Karma. This is a totally free, app-based buisness that relies on individuals either soon-to-be completing their course work within computer science specialties (UI/UX, Software Engineering, Data Science, Cybersecurity) or already in those fields. I started their #21DayCKChallenge by opening up their app daily and working through their suggested steps in becoming familiar with how to code. One of the first steps was to download other free apps such as Grasshopper, Sololearn, and Codecademy Go. These apps guide you through starter code to begin to learn software engineering basics and get a feel for what the computer science world is like. I kept up with the check-ins and even had 1-on-1 time with my mentor whom was soon to finish her self-paced bootcamp, learning Software Engineering. After a few conversations with her getting to know me and the professional background I came from and degree I had, she suggested I try learning about Data Science instead of SE; since I liked math growing up and had a buisness background (helpful to relate to your employer in a non-technical aspect).
There are many ways to go about creating a recommendation system but I will be solely explaining the context of the Surprise model and how I utilized it within my project. This is more of a continuous, deeper exploration from my previous post here.
What is a recommendation system? It’s pretty straight forward thinking. Think of what you use every day. Products like Amazon/Prime, Netflix, Hulu or whenever a website discloses that they use cookies when you browse their site; all these types of systems utilize machine learning and different algorithms to effectively market to you specifically from the device you work off of. What these software companies are doing is learning the consumer habits and frequently viewed topics of interest to better cater to those interests, effectively marketing or advertising product similar to what has been previously viewed or purchased. Amazon is one of the most efficient at doing so when you quickly think about it. I’m always looking for that ‘thing’ on their app or website. I look at what’s similar to item ‘x’ and what others have purchased with this item or have viewed because they viewed this item. It’s crazy to think about how software has evolved into almost becoming mind readers! I hear jokes about “I was thinking of buying this speaker the other day and when I went onto Amazon, there it was! I hadn’t even searched for anything yet and it suggested the speaker as an item I might like”.
There will be numerous times you will come across an imbalanced dataset. If you were to run accuracy and recall scores of your model, you will likely see that it’s very high. Think of this: let’s say you are building a predictive model to accurately measure whether or not your customer will continue services with your company based on x-y-z factors. You have a dataset that has an imbalanced count of those who continue services and those who terminate them. 90% of your customers continue their membership while 10% cancel. When you run your accuracy score’s, you will most likely get a high number (0.90) because the computer is predicting 9/10 times the customer continues services. That’s awesome! You think, “Wow! This is a super accurate model”. Then you continue to check on the recall score of that same model. Turns out your score is 0.45. Oh no! When it comes to new information entering the model, the computer hasn’t learned to look for certain features. An example of features that you might utilize are age of account, how often the account’s been utilized on a monthly bases, or the number of customer service calls it’s had during its lifespan. These are just to name a few to get the idea of customer retention features you might see within your next dataset. Your model won’t be able to accurately predict if the age of an account is high, the customer will more likely continue services. Maybe if the customer had more than five service calls, they would have a higher chance of canceling the subscription. That’s why it’s so important to balance the weight of the yes’s and no’s for your target “customer churn”. When the weight distribution is the same, the model you end up choosing can fairly assess both end results of the customer and learn the features of what goes into a customer staying or leaving.