I outlined earlier the inherent difference between the web now and the web to be. That is, the difference between Web 2.0 and Web 3.0. A big question that comes out of that description is how will current use of technology change to allow for this Web 3.0 to emerge?
I argued that Web 2.0 and 3.0 aren’t defined by their technology per se. They’re defined by their usage scenarios which imply, but don’t necessitate, innovation in software. Web 2.0 didn’t do anything new it just did it with new people in a new space. JavaScript has been around for as long as Netscape, XML was created by a 161 people a decade ago. Web 2.0 introduced the notion that everything you do on a computer could be done with a browser and that the web could become as valid a medium for communication as any other - including real-space.
Web 3.0 extends that, it pushes the boundary and posits that not only is the Web as good as anything else, it may, in fact, be better for a tonne of things. As our democratic ideals start catching up to technology we want to develop systems whereby everyone can have a meaningful contribution, where discussion really is discussion and not a room full of murmuring, or an epistemic oligarchy. Through the web, we can enable what is intractable in reality - the promulgation of the ideas, thoughts and discourse of hundreds of thousands, millions or even billions of people in a meaningful way.
The key distinction between the web now versus the web 3.0 is that word ‘meaningful’. The web already integrates everybody with everybody else to some degree, but it makes it a devil of a task to discover content, people and ideas that are meaningful and relevant - especially for humans.
Enter the semantic web. The semantic web, as the name implies, will provide a mechanism for machines to have a facilitated comprehension of meaning - to aid us in discovering new ideas, people and content by the association of meaning rather than hypertext references.
Web 1.0 and even 2.0 has a single-dimensional association between content (web pages/sites), this association is the hyperlink. The context of where the hyperlink appears on a page let’s a human reader know what that association actually means - a machine cannot understand that, and even if it could it can’t expand upon that through entailment. The semantic web allows an unlimited dimensionality of associations between content - even ones that are emergent rather than ascribed.
How will this look? What kinds of tools and purposes will we have for this kind of association? We see some fledging attempts at semantic assocation by way of “tagging”. Tagging works best as a controlled vocabulary - that is for any given subject attribute, there is one and only one word to describe it. Tags tend to break down as semantic data because people want to be comprehensive in their tagging and thus enter semantically redundant tags. Tagging is a step below controlled vocabulary which is arguably the least semantic of the semantic methods of content organization.
The next tier of semantic technology, which provides more information as to the meaning of content is a taxonomy. We’re all familiar with taxonomies even if the term isn’t. The traditional way of categorizing the animal kingdom is a taxonomy (Domain, Kingdom, Phylum, Class etc.), the Dewey Decimal Classification and Library of Congress Classification systems are taxonomies. These provide once described to a machine, a certain kind of semantic understanding - that of generality and abstraction. What is true of a parent element must also be true of its children, e.g. “What’s true of mammals is also true of apes.” This is significantly richer than the rattle-bag of attributes that tagging introduces but it’s most constrictive - typically an entity can only be in one place on a taxonomy and what if the taxonomy isn’t fully understood?
The third semantic system available combines aspects of tagging and taxonomy - only in this case it’s the tags that exist within the taxonomy rather than the entities, and that is the thesaurus. Thesauri allow machines to understand equivalance in tags, homographic relationships while still having the hierarchy and classification inherent to taxonomies.
The fourth and final system is the ontology. The ontology takes the basic framework of the thesaurus and applies logical rules to it. Whereas the thesaurus has no notion of mutual exclusion or entailment; this is something the ontology enculcates. In a biological ontology one need only specify an ape is a mammal rather than it is both a mammal and warm-blooded, because the ontology understands that although these are different things, one entails the other - it would also understand its impossible for an ape to be cold blooded.
I’ve described these systems in deliberately abstract terms because the actual applications are impossible to foresee - however I envision our migration to the semantic web essentially climbing these steps of increasing semantic robustness. Each step gets further and further away from the kind of usage scenarios in which we typically engage with existing technology.
I don’t believe we’ll all hop-in head-first into developing personal ontologies and deploying ontological tools, just as we haven’t gone head-first into the social networking tools. The pace is lightning fast but not instantaneous.
As users become accustomed to the workflow inherent to each one and automation tasks make them almost redundant we’ll see a steady progression.
The technology underpinning all of them, RDF, much like XHTML for the remainder of the web, is robust and simple enough to maintain viability throughout - the more advanced technology GRDDL and OWL in partciular, I predict won’t see adoption until we start moving up the semantic system ladder.