Andrew Ng introduces a new (free) course on Vector Databases by Sebastian Witalec

Update 2023-11-16: I attended the course, and, unfortunately, am not very convinced. The course is rather superficial and in the beginning, many basic details are covered that most people from the IR and RecSys community probably know already. The latter part of the course also seems a bit like an advertisement video for the Weaviate database.

Vector databases gained popularity with the introduction of LLMs but were already useful prior to that, among others, for recommender systems. Now, Sebastian Witalec, head of DevRel Weaviate, has introduced a new 1-hour course ‘Vector Databases: from Embeddings to Applications‘. The course is available on DeepLearning.ai and backed by Andrew Ng as seen in the following introductory video. The course currently is for free, but according to the website this is a limited time offer.

For those who do not know what a vector database is, here is a summary of an article from Microsoft.

A vector database is a special kind of database that keeps data in the form of vectors. Think of these vectors as a way to describe different features or qualities of data, like a list of numbers that represent various aspects of an item, be it a photo, text, or even a sound clip. The number of these features in a vector can be quite large, sometimes even in the thousands.

What’s special about a vector database is that it’s really good at finding things that are similar. Instead of searching for exact matches like in traditional databases, it looks for things that are alike in meaning or context. For example, if you have a picture, the database can find other pictures that look similar. Or, if you have a document, it can find other documents that talk about similar topics or have a similar tone.

To do this, you use something called a “query vector.” This is like a search request, but instead of words, it’s in the form of a vector. This query can be made from the same kind of data you’re searching through, like using a photo to find other photos. It can also be different, like using text to search for photos. The database then measures how similar the search query is to the data it has, using math to figure out how close or far apart they are in terms of their features.

The results you get back are sorted by how similar they are to your query. So, you get a list of items, like images or documents, that are the closest match to what you’re looking for. You can then see the actual data, like the photos or text, that these vectors represent. This makes vector databases really powerful for finding things that are similar in a deep, meaningful way, not just because they have the same words or numbers.

Add a Comment

Your email address will not be published. Required fields are marked *