After acquiring GitHub the last year, Microsoft is showing more inclination towards the open source community. Just after open-sourcing quantum computing development tools last week, Microsoft is now revealing one of their important secrets to the community.
According to an official blog post by Charlie Waldburger, Microsoft announced that they have open-sourced a key piece of Algorithm that makes Bing search services able to quickly retrieve and show results to users. By making this technology available for everyone, the company is hoping that developers will build similar experiences in other domains too.
In this age of abundant data, there are many uses cases where users search through vast data troves including in retail. So, there won’t be an issue in finding areas which can be improved. Microsoft has currently open-sourced a library that they have developed to make better use of all the data the company collected and AI models they built for Bing.
“Only a few years ago, web search was simple. Users typed a few words and waded through pages of results. “Today, those same users may instead snap a picture on a phone and drop it into a search box or use an intelligent assistant to ask a question without physically touching a device at all. They may also type a question and expect an actual reply, not a list of pages with likely answers,” the spokesperson said in the announcement.
The open-sourced Python library run Space Partition Tree and Graph (SPTAG) algorithm at its core and that is what makes Microsoft able to perform a search through billions of pieces of information in milliseconds.
I know Vector isn’t something new but the company has applied this concept to deep learning models in order to make it more effective. Talking a little bit about the process, the team first takes a new model and encodes data into vectors. Here, each vector represents a pixel or word. Then. it generates a vector index using the SPTAG library. As it gets queries, the deep learning model translates the text or image into a vector and the library finds the most related vectors in that index.
As per the Microsoft, “With Bing search, the vectorizing effort has extended to over 150 billion pieces of data indexed by the search engine to bring improvement over traditional keyword matching.” “These include single words, characters, web page snippets, full queries, and other media. Later, Bing can scan the indexed vectors on every search and deliver the best result.” they also added.
If you are interested, the library is now available under MIT license with all the tools required to build and search these vector indexes. For more details about using this library and sample applications, visit here.