ML Scoring

At Georgian, a venture capital firm, we used machine learning (ML) methods to sift through the vast number of startups to find companies that would be interesting to manually investigate. I worked with data engineers, data scientists and software engineers to build data pipelines to ingest data, machine learning infrastructure to train and infer on firmographic data, machine learning models to learn what an interesting company looks like, and integrations to tools already used by the business in order to surface the ML model’s predictions.

There were many interesting challenges that we encountered:

How do we avoid the recommendation algorithm from getting into an echo chamber and only recommending from a narrow set of companies? For example, if the recommender figures out that cybersecurity companies are interesting and keeps recommending them, we may miss exciting up-and-coming companies in other industries.
How often do we need to re-train the model? If we train too often, it may be unnecessarily costly. If we don’t train often enough, the system may miss market trends.
How do we gain the trust of our business users? The recommender’s output is just one score per company. What other supplemental information is needed for them to trust the score?