The Architecture Used at LinkedIn to Improve Feature Management in Machine Learning Models [Jesus Rodriguez@Medium]

The scale of the machine learning problems that an organization like LinkedIn deals with results incomprehensible for data scientists. Building an maintaining a single, effective machine learning models is hard enough so imagine coordinating the execution of thousands of machine learning programs to achieve a cohesive experience. Feature engineering is one of the key element to allow rapid experimentation of machine learning programs. For instance, let’s assume that a LinkedIn member is described using 100 features and that the content feed rendered in the member’s homepage is powered by 50+ machine learning models. Assuming that every second, there are tens of thousands of people loading their LinkedIn pages, the number of feature computations required is something like the following:

(number of features) X (number of concurrent LinkedIn members) X (number of machine learning models) > 100 X 10000 X 50 > 50,000,000 per second

That number is just unfathomable to most organizations starting their machine learning journey. Such a scale requires representing features in a flexible and easy to interpret manner that can be reused across different infrastructures such as Spark, Hadoop, database systems and others. The latest version of LinkedIn’s feature architecture introduces the concept of typed features to represent features in an expressive and reusable format. This idea arose from challenges with the previous version of LinkedIn’s machine learning inference and training architecture.

Add a Comment

Your email address will not be published. Required fields are marked *