Software Advice product pages have a new reviews facet that features the top pros and cons culled from product reviews. These are currently hand-picked for each product.
There are thousands of products in our database. And there are over 800,000 reviews.
Hand-picking is for the birds. We should have machines pick these for us.
The scope here would be to assist the process and to present a more manageable list to the associate responsible for culling best product reviews. If the Gartner brands grow at the rate we would like them to, employing artificial intelligence would help us scale with the growth. If we show success with this work, we could try out all sorts of ideas to help pair software with users.
Gartner announced they were going to host a company-wide hackathon to be held in May of 2019. I proposed the machine learning proof-of-concept above to address the low quality of the pros and cons featured on our new reviews page.
Leadership
I recruited hard for our project. I was successful. At 13 members, our team was the largest hackathon team across Gartner. Most of us signed up because we were interested in machine learning, but had no experience with it or direct knowledge of it.
I spent some time ramping up machine learning, it’s practice, and a few common algorithms. I learned the easiest way to set up a local environment to go about doing it including what language and framework might be easiest to quickly get set up and going.
I held a meeting the week before to ramp everyone else up on how to machine learn. I thought it was important that everyone be able to pursue whatever approach they saw fit to learn the skills they desired to learn.
Here was my proposal for how to do that:
- get everyone on the same page:
-
- install machine learning tools locally or set up some cloud instance
- train up on how to use a model, feed data into it, get and interpret results
- split into teams (and compete):
-
- can come up with your own hypothesis and go about trying to implement a custom algorithm
- can employ different models on another team’s algorithm
- can write model customizations to fine-tune a promising algorithm
- could split based on machine learning experience or framework preference
- could have some rotating volunteer(s) available to help solve problems or share useful info across teams
- come back together:
-
- if all good: compare results
- if still problems: solve lingering issues together
- summarize what we’ve learned
Teams
One member of our team well-seasoned in using Amazon’s suite of products decided to pursue a solution using Comprehend, an AWS tool for natural language processing and text analytics. This project quickly became the standard to beat. Several of our hackathon team members defected volunteered to work on it.
I created a local docker environment pre-built with Python’s Keras framework. I shared it with all who wanted to get their hands dirty with actual machine learning 🙂 to try to beat the AWS team’s results. I wrote up documentation for how to use the container I’d created, how to pull reviews data from our API, transform it, and feed it into a Jupyter notebook, and how to implement a handful of artificial intelligence models.
One team decided to apply a page rank algorithm against the review text. This is similar to how Google ranks sites in search results based on page content. Another team used a supervised learning scheme that trained on review completeness scores (calculated when a review is submitted) to determine best reviews. My team employed a topic modeling scheme to find the major themes across reviews and pull representatives from each theme.
Results
In the end, we all mostly got results of one kind or another. We learned a few things along the way. And everyone lost to the AWS team 🙂
The AWS team captain and I shared the spotlight in presenting our project and findings in a short company-wide presentation. Afterward, we went on to implement the AWS Comprehend solution in production.