Steven Janke: April 2013

Content, there's a lot of it about. Sifting through everything that is "content" is quite the daunting undertaking. And not just sifting for sifting's sake, but sifting and comprehending the value of the content.

Up until the advent of the Internet, content was primarily physical. Sure, databases began storing content via IBM products in the 1940's, but everything in these databases leading up to the Internet was mostly business-related data, hence the company was aptly named - International Business Machines. Other content before the Internet then was printed, painted, pressed, or simply "known".

Indexing helps to sift through the existing content, allowing searching for keywords to drag out responses that hold similar content. However, there is a very large problem with this functionality. Indexing doesn't return "the most valuable content", it only provides content that will (1) likely cover what you are searching for due to shared terms, (2) open itself up for misinterpreting what you meant in your search, and (3) give you a shitload more content that would take years if not millennia to search through. Imagine that, searching through the content that was derived from a previous search attempt.

Google, for example, is a platform that employs spiders to continuously seek out, understand, and index content available on the World Wide Web. However, the simple experiment of a keyword search for "baseball" retrieves a response of about 549,000,000 results. While these results only took 0.73 seconds to generate, it's still 549,000,000 results. If you considered each result, every second, it would take about 17.5 years to look at every single result. Naturally in this scenario, within the first few minutes of looking you would have a fair understanding of the term "baseball", but what are you really looking to understand? Was the search term "baseball" valuable to begin with? Did the search provide the results that got to the question or insight you approached the search function to understand? Did you get beyond even the third page of results before finding what you wanted to know? How can we fix it such that your first search will always give you exactly what you are looking for? The goal is to make it so you do not need to look any further, to consider a daunting number of results, and to remove the sense of data overload.

The powerful tool of indexing then isn't the problem, the problem is in how we "talk" to computers.

I propose a new functionality in which we may communicate with computers. Start with a term, and have the computer ask a tree of questions to further narrow down its umpteen-to-the-tenth number of results. Via this process, the computer "works" with the user to best determine the source, data, and knowledge the user is seeking. This conversation is possible. We have the technology. I've seen it in action with the 20Q game in which you think of something, and the computer asks a series of questions to ultimately zoom in on what you thought of, doing so within just 20, and sometimes far less, questions. It's quite remarkable, check it out.

For some questions in our new search functionality, however, I imagine the path of questions to answers would be rather short, where some may be rather long. Additionally, given attention spans of us humans, I'd recommend targeting nothing greater than 5 questions.

Outside of the 20Q game, in practice we see similar questioning by doctors when deciphering a patient's symptoms. The well taught tree of questioning might look like this:

- How long have you had this problem?
- Have you experienced changes over time?
- Do you have any idea what may trigger changes in symptoms?
- Do you have family history with this aliment?
- etc.

This entire line of questioning is meant to develop context, to cancel out known issues and to focus in on those that remain suspect. After which, the doctor is potentially in a better position to run targeted diagnostic testing, which would have otherwise been shots in the dark without the line of questions. Computers can be trained to do just this for any topic, and much easier still with the advent of machine learning such that trends in search requests would make it all the more easier for computers to quickly react to responses received by other users.

A product then, for example, could be Google Converse. With the power of Google search engines interacting with users within a defined set of questions and incremental searches on the previously generated responses, and layering on any trending content, would be a leap forward in the engagement users have with the Internet, the company, and the content.

-Steven Janke

Steven Janke

Pages

Friday, April 12, 2013

Content