The Evolution of Search
Posted on Mon, Jul 12, 2010

Prior to joining Searchandise Commerce, John Puopolo served as director of software engineering at FAST (acquired by Microsoft), a world leader in enterprise search software. As such, he brings unique perspective to the company as we build new solutions within the larger search market. Following are his insights on the most significant changes in the market since leaving FAST four years ago.
The fact that businesses of all shapes and sizes now have the ability to create, collect and publish “facts” and “data” at a blinding rate has created, ironically, an information deficit. Pure data is, in effect, a commodity. Anyone with a computer and an Internet connection has the ability to access all sorts of facts. The problem is that we are awash in facts, and our ability to extract information and useful linkages is, I believe, decreasing at an increasing rate.
To address this paradoxical situation, pure search engines have tended to evolve into platforms that offer several new features including:
- Scoped search
- Active intelligence
- Post-discovery
In a nutshell, scoped search is the ability for a search engine to filter and rank documents based on a subset of a document versus the entire document. For example, in searching the plays of William Shakespeare, we might want to retrieve all plays where Romeo is not only mentioned, but is also a speaker in a given act. Scoped search makes these type of specific, semantically rich searches possible. This enables us to reduce “fact noise” and retrieve precisely what we need, making it easier to extract useful information from a smaller set of highly relevant content.
Active intelligence, a term I associate with Attivio, is the combining of disparate information silos into a cohesive, searchable and mineable form. Oftentimes, information is the product of synthesis – recognizing patterns and linkages among facts. The first step in being able to do this at scale is to ensure that all of your data – from well-structured data, to semi-structured and unstructured data - is cross-correlated and accessible en masse.
Discovery is a term used in the world of Business Intelligence (BI) that encompasses the “slicing and dicing” of interrelated data for the purposes of identifying and using relationships. This has proven a fairly effective way to mine for information; however, there are limitations with this approach. To be effective, BI data must be structured against a predefined schema. This means that semi-structured and unstructured data are often missing from the BI equation. In contrast, post-discovery is a term used by some to recognize the evolution of BI constructs from highly structured to loosely organized data whose relationships can be navigated via intelligent search.
In the next several years, I think we’ll see products and platforms that help to dramatically reduce the information deficit by applying and combining precise search, active intelligence, post-discovery and semantic mining. This is no easy task, but we’re approaching an age where ubiquitous, high-speed connectivity, coupled with near limitless compute power in the form of cloud computing, will enable new models of turning data into actionable information.