Andrew Ng introduces a new (free) course on Vector Databases by Sebastian Witalec
Update 2023-11-16: I attended the course, and, unfortunately, am not very convinced. The course is rather superficial and in the beginning, many basic details are covered that most people from the IR and RecSys community probably know already. The latter part of the course also seems a bit like an advertisement video for the Weaviate database.
Vector databases gained popularity with the introduction of LLMs but were already useful prior to that, among others, for recommender systems. Now, Sebastian Witalec, head of DevRel Weaviate, has introduced a new 1-hour course ‘Vector Databases: from Embeddings to Applications‘. The course is available on DeepLearning.ai and backed by Andrew Ng as seen in the following introductory video. The course currently is for free, but according to the website this is a limited time offer.
For those who do not know what a vector database is, here is a summary of an article from Microsoft.
A vector database is a special kind of database that keeps data in the form of vectors. Think of these vectors as a way to describe different features or qualities of data, like a list of numbers that represent various aspects of an item, be it a photo, text, or even a sound clip. The number of these features in a vector can be quite large, sometimes even in the thousands.
What’s special about a vector database is that it’s really good at finding things that are similar. Instead of searching for exact matches like in traditional databases, it looks for things that are alike in meaning or context. For example, if you have a picture, the database can find other pictures that look similar. Or, if you have a document, it can find other documents that talk about similar topics or have a similar tone.
To do this, you use something called a “query vector.” This is like a search request, but instead of words, it’s in the form of a vector. This query can be made from the same kind of data you’re searching through, like using a photo to find other photos. It can also be different, like using text to search for photos. The database then measures how similar the search query is to the data it has, using math to figure out how close or far apart they are in terms of their features.
The results you get back are sorted by how similar they are to your query. So, you get a list of items, like images or documents, that are the closest match to what you’re looking for. You can then see the actual data, like the photos or text, that these vectors represent. This makes vector databases really powerful for finding things that are similar in a deep, meaningful way, not just because they have the same words or numbers.
Related Posts
The ACM RecSys Conference introduces major changes to the submission process (no short-papers; response phase; ethics)
An update to dislikes on YouTube and it’s use in the recommender system
ACM SAC 2022 with “Track on Recommender Systems: Theory, User Interactions & Applications”
About The Author
Joeran Beel
I am the founder of Recommender-Systems.com and head of the Intelligent Systems Group (ISG) at the University of Siegen, Germany https://isg.beel.org. We conduct research in recommender-systems (RecSys), personalization and information retrieval (IR) as well as on automated machine learning (AutoML), meta-learning and algorithm selection. Domains we are particularly interested in include smart places, eHealth, manufacturing (industry 4.0), mobility, visual computing, and digital libraries. We founded or maintain, among others, LensKit-Auto, Darwin & Goliath, Mr. DLib, and Docear, each with thousand of users; we contributed to TensorFlow, JabRef and others; and we developed the first prototypes of automated recommender systems (AutoSurprise and Auto-CaseRec) and Federated Meta Learning (FMLearn Server and Client).