The Pathetic Fallacy of RDF

David Karger and

mc schraefel
Position Paper for SWUI06

1. Introduction


The most popular visualization of RDF - the underlying language to represent the Semantic Web ? is a Big Fat Graph (BFG).  By graph, we mean representations with nodes and edges to model the relationships within the space represented. Take a look at any of a variety of user-interfaces to RDF. Frodo shows ontologies as BFGs.  Rdf-gravity, goes further, displaying instances as well as classes in the BFG. IsaViz and RDFAuthor use a BFG not only to present but also to author or edit RDF Data.  A preexisting graph visualizer, GViz, has been redeployed and applied anew on RDF (pdf).

The above is just a light sampling of RDF-based visualizations that focus on Big Fat Graphs as the default mechanism for representation. Why are they the default representation?  An implicit argument seems to be that since the Semantic Web is a Graph, ipso facto we use graphs to represent it. This notion that data should be presented to the user as it is represented in the computer is what we label the pathetic fallacy of RDF.  In the arts, the pathetic fallacy is the act of ascribing human feelings to inanimate objects.  We argue that the same fallacy is turning up on the semantic web---that researchers are (perhaps subconsciously) allowing the computer's internal representation of data to influence the way their tools present information to users, when instead they should developing interfaces that are based on the users' needs, independent of the computer's particular information representation.

In the following discussion, we will look at some examples of the pathetic fallacy in terms of the interaction challenge of

From this context, we investigate two questions:  Are graphs the right default representation for the Semantic Web?  And if not, how might we think about default presentations for the Semantic Web in order to make accessible its promised benefits for knowledge building and sharing?

2. Graph Vizualization Exemplars: What are They Doing?


In the following section we consider three tools that present RDF using Big Fat Graph representations, and explore what they communicate and how they do so.   We ask what is helped by the particular attributes of each visualization.  In particular, we ask what task is being supported, and if a graph is really the best representation for that task?

2.1 RDF-gravity

RDF Gravity renders ontologies. The figure shows the tool rendering part of an ontology of wines.  The main attributes of of rdf-gravity are:

When are these set of features useful? How do they differ from existing multiple selection? One answer is that the graph is heterogeneous; one may be able to select wineTypes and regions and  vintages. Typical UIs only support selection of a single type of object---a group of winesTypes (or regions or vintages, as per Endeca?s demonstrator).


Building heterogeneous groups is extremely powerful: we do it all the time with file systems, not worrying about file type, in order to reflect what we need for a certain task, collecting it in a directory if we?re organized. But we do not get to do this for things that are not files, because we cannot refer to them outside their applications. This is one of the major wins of the Semantic Web: a format for reference to anything.  Note, however, that building groups requires only this ability to reference items, with no need to represent distinct relations in the model or the user interface. 


Having pinned down features of this graph presentation, we need to ask, is it really the optimal representation (the documents around RDF gravity do not attempt to argue this point, apparently taking it for granted)? For instance, one of the obvious assets of such graphs is the ability to see clusters. If discovering clusters is important, however, why not build a tool that clusters and reports on what the clusters are (like scatter/gather did long ago). If clusters can be reported, is the graph still relevant?  As for heterogeneous grouping and selection, we have just observed that this can be done using the familiar folder interface for filesystems. Even for ontology viewing, is a graph the best tool for examining it? What user studies support such a claim where such graphs have been placed head to head against other interaction models?  


2.2 Graph visualizers redeployed on RDF

The GViz graph visualizer is used to visualize of the schema of art in the Rijksmuseum (rather than a visualization of the art itself).  The authors claim that its major contribution, relative to other graph visualizers, is the extensive customizability of the shapes, colors, and layouts of the edges and nodes making up the graph.  Let us consider the paper's Figure 9.5, shown here along with its accompanying text from the paper:

>

?In the above picture the edges with the label rdf:type are depicted in blue.  There are two red nodes to which these blue edges connect, one with the label rdfs:Class and the other with the label rdf:Property, shown near the nodes as balloon pop-up texts. We chose to depict the property nodes (laid out in a large circular arc around the upper-left red node) in orange and the class nodes (laid out in a smaller circle arc around the lower-right red node) in green. As it can be noticed from the picture there are a lot of orange nodes which is in accordance with the property-centric approach for defining RDFS schemas. In order to express richer domain models we extended the RDFS primitives with the cardinality of properties, the inverse of properties, and a media type system. These extensions are showed in yellow edges (see also below) and yellow spheres (positioned at the right end of the image). The yellow edges that connect to orange nodes represent the inverse of a property. The yellow edges that connect an orange node with the yellow rectangle labeled ?multiple? (positioned at the middle of the figure bottom) state that this property has cardinality one-to-many. The default cardinality is one-to-one. Note that there are not many-to-many properties as we had previously decomposed these properties in two one-to-many properties. The three yellow spheres represent the media types: String, Integer, and Image. The light gray thin edges denote the domain and the range of properties. Note that only range edges can have a media node at one of its ends. ?




Was the graph visualizer used to discover the things the text is telling us? Rather, it seems those things had to be discovered first, and then the graph layout customized to indicate the discoveries.  Is this graph useful? Does it, in other words, tell us anything that the accompanying text is not telling us better? Indeed, would the graph make any sense at all, were the text not there to explain it? If not, then the graph seems a poor choice for exploring the data in the first place, or even for presenting the results of the exploration to others.


The paper discussing GViz lists many questions that the authors argue can be answered using their tool---questions like "what are the most referenced concepts", "what is the relation connecting two instances", and "what are the attributes of a given instance".  There are many alternative, standard interfaces for posing these questions---such as a tabular view, a simple form showing the attributes of an instance, or a query-building wizard---that must be held up for comparison, if one wishes to make a good argument for the value of the graph visualization.


Interestingly, a related group, CWI, in the Netherlands has come up with an interesting Semantic Web explorer for building up knowledge about the same collection. The application is called ?Topia? [pdf]. Not a graph in sight.

2.3 RDF Graphs with IsaViz

200608031205

IsaViz is another highly customizable graph-visualizer for RDF.  Working with the W3C working papers as a data set, its creator observes that "plain" graph presentations of the data are "not fully satisfying and have their own problems: diagrams can quickly become big and over-cluttered, and some editing tasks can be more difficult to achieve when dealing with a visual representation of the model."  He proposes Graph Style Sheets (GSS) as a way to improve the presentation, controlling color, shape and layout of the graph much as GViz did.  With GSS, IsaViz is better able to show that the typical model for the drafts is mainly linear but with occasional branching, so trees more generally. 


Once this model is established, however, what good is the graph visualization?  Is there anything it could be used for afterward?  Or is the graph only a starting point to take what has been learned in order to choose an appropriate interaction model and user interface (such as the well known tree-widget from file directory browsing) for effectively exploring the tree structures of the working drafts, or alternately to encourage other kinds of desired behaviors? What would such a tool look like? 

The GSS rendering does seem to be a good way to provide visual support for assertions about the tree structure of the data, but (as we also asked regarding GViz), could it have been created before the creator knew about the tree structure?  A tool that makes comprehensible pictures after the user understands the data might be useful as a publishing tool, but can it make a good interface for interacting with the data to understand it in the first place?

GSS itself lends additional weight to our question about the usefulness of graph presentations, because it offers styling instructions that begin to make IsaViz much less of a pure graph visualizer.  As we can see in the figure to the right, GSS can be used to define specialized, type-specific icons, and even to specify that certain attributes of objects should be laid out in tabular forms instead of as circles and arrows.  In other words, IsaViz tries to make its graph visualizations better by making them less like graphs, incorporating other kinds of UI.  But why assume that graphs are a good starting point?





3. Aren?t Graphs the Natural Choice?

So what?s wrong with these Big Fat Graphs? After all, RDF is a graph.


That is true, but so is the Web, and we do not commonly see people exploring the Web via it?s bowtie shaped nodes. After all, ?everything? can be represented by a graph (just as all database researchers assert everything can be represented in a database---RDF's graph model has equivalent power), and yet we do not use graphs to display ?everything.? The idea that data should be presented the way it is stored would lead us to present (bitmap) images to a user as sequences of red-green-blue triples, and file directories as lists of (file name, inode) and (inode, file information) pairs.


Indeed, the fact that the Semantic Web is so clearly a graph might be seen as a bit of a historical accident. The SW has two critical concepts: the naming of individual items, and the expression of relations between them. But who says relations have to be binary? The database community has been working with arbitrary-arity relationships for decades, and there are many relations that are naturally of high arity, where we have to do contortions with reification to represent them in rdf. So we might consider a parallel world, where the database people looked at Tim Berners-Lee?s Web and said ?good idea, let?s do it for databases!? They would have needed to recognize the importance of a universal format for naming entities. But once they named them, they could announce a standard tuple format for specifying arbitrary-arity relations on these named entities.


This world could have pretty much resulted in the same semantic web as we have now. But, the exposure of high-arity tuples from the start might have prevented the growth of all these graph visualizers.  The lack of an "obvious" visual presentation of the high-arity tuples might have gotten us to a place where we were more focused on what we want to DO with that data as information, rather than on the easy correlation we currently have of rdf=bfg. Which brings us back to the issue of why such graphs are problematic.  (Alternatively, we could have ended up with a horde of user interfaces based on sql, which would have strangled the semantic web effort even worse than graph visualizers).

 
So, it seems that the Semantic Web community tends towards graph visualizations because there is such an easy correspondence between (a) knowledge that the model structure is a graph and (b) easy access to algorithms to render graphs.  Interestingly, the database community had something to say about this long ago.  An early paper by Codd discussed the issue of data dependence---the fact that database applications could get in trouble if the developer of the application "knew too much" about the way that database represented its data, and assumed that representation would not change.  The software community has since internalized that argument, recognizing the importance of an API to separate developers from the details of the applications they use.   But in a way, graph views show another kind of data dependence.  End users of interfaces have some abstract notion of the information objects they are working with---people, addresses, songs, and so on---and are unlikely to care how the computer represents those objects.   To show them the graph that the computer uses to represent their information is to force the computer's view of the data onto the end user, rather than using the computer to support the end user's sense of the information. There is a difference between the system or machine model (the abstract model of the information that is physically realized in the computer) versus what has been referred to as the user's mental model - how the user conceptualizes the application (not just the data, but what they can do with the data in the application). Though the "mental model " has fallen out of favour in the HCI research community, the design community still uses that concept to express the criticality for UI designs to support the users' model / expectations of the system. Indeed, suites of evaluation heuristics have been developed (and tested) such as cognitive walkthroughs , as one example, which test interfaces largely on the basis of the flow of the interface steps to complete a given task being in accord with the user's model of the interaction.


We can also turn the above argument, that graphs don't have to be displayed as graphs, on its head.  If graph visualizers are as useful as they are currently being made out, then why are so many of them only appearing now?  In many cases, the data existed prior to the semantic web, in assorted database or proprietary formats.  And just as we argue above, the fact that the data is not represented in RDF need be no deterrent from visualizing the data as a graph, if that is useful for the task at hand.  If graph visualizations are useful now, then they would have been equally useful on those prior representations, but they were not developed.  What has changed?  Only the fact that the underlying representation is now a graph, so we must ascribe some of the motivation for the new UIs to that pathetic fallacy.  


In summary, we suggest that if there is to be a place for graphs as user interfaces, it must be because of their properties as interfaces rather than because of any particular connection to the data representation---we should be motivated to use them even if the data we are working with is not represented internally as a graph.  In other words, they must tell us something we want to know, or let us manipulate something we want to change. Let us ask ourselves, therefore, what graph visualizations such as these actually do.

3.1 BFG's: What are they Good for?

Cool Factor.  It is undeniable that graphs make wonderful pictures to put in an article or on the cover of a book, to prove one's data is complicated.  A proper tour-guide script (like the one we quote from GViz) can help users understand the message being conveyed by a particular graph presentation.  But such scripting can only happen after someone understands the data.  For a user interface, which will often be used for exploring new data, we cannot count on such a guiding script.

Density and Clusters. BFGs are known to address two things that are interesting but are limited. They show the Shape and Density of a given data space so that we can say things like ?Oh that?s really big? or ?there?s a lot of activity going on down there in that part of the graph, but not much up here.? It is not clear, however, when graphs are the best ways to communicate information about densities/clusters, as opposed to having the computer ?see? them for us, for instance, and present those findings in ways that are more meaningful and manipulable.


Transitive Closures.  Some commonly used graph visualizations are flow charts (for showing computations), PERT charts (for showing schedule dependencies in a large project, org charts (for showing reporting relations) and UML class and package diagrams (for showing subclass and package inheritance relationships).  It is interesting that all of these representations involve directed, acyclic graphs (except for flow charts, which have a limited amount of cycling), and generally present only one relationship only.  As for operations, we may be interested in finding the path between arbitrary pairs of nodes (e.g., the critical paths in a PERT chart) or in (say) the least common ancestor of two packages in a UML diagram or two entities in an org chart.  Graphs are an excellent representation for these kind of activities, which require a 10000-foot view of the graph, but not much detail about the individual nodes.  They allow path-finding without any further user interface actions (such as clicking on links to follow them).  Of course, this breaks down once the graph is too large to fit in view.


The Big Picture.  More generally, the authors of GViz argue that "text-based displays are not effective for data understanding, i.e., making sense of a given (large) dataset of which the global structure is unknown to the user." It is not clear that this assertion is correct. Work in text-based data structure representations from semantic zooming to faceted browsing to list categorizations have shown strong value for understanding large datasets.

3.2 What are they Not so Good for?

The typical graph approach is problematic in terms of the usablity/usefulness which graph interactions generally allow. We list a few of these below:


Another problem with graphs is that it is very easy to give too much weight to accidents of the graph layout algorithm.  Just as people look at random sets of points, see clusters (because random points will always have some regions of higher density), and conclude that the data is not random, people may see meaningful structure in a graph layout that is not really there.  Consider, for example, the figure shown here.  It was laid out using graphviz, a graph drawing package used in many RDF visualizers (such as Frodo, mentioned above).  It appears to exhibit a "central core" of densely connected nodes, with a ring of less-connected nodes at the periphery.  This is a false conclusion, however.  The graph being presented is the three dimensional version of the torus---a cubical lattice (a bunch of cubes packed together, like a salt crystal) wrapped around to meet itself at the ends.  It is therefore completely symmetrical---no node is more central (or more densely connected) than any other node.  The apparent asymetry of the nodes arises from the particulars of the layout algorithm, not the structure of the graph.


If we can accept that while graphs may be easy algorithmically to map directly against RDF, but that they are not necessarily the most appropriate representation as interface for most things people may wish to do with rdf data, we may want to ask what new things are enabled by the RDF model and how should UIs take advantage of them?

4. Moving towards the RDF Un-Graph

A fundamental question from the HCI perspective related to any interaction design problem is: what question/task/need is a given representation answering, whether graph or other visualization? To put the question another way, what visualizations/representations/interactions would best support the specified tasks?

To move from the general question to the more Semantic Web oriented question, we can push further to ask what, if anything, is special about what the Semantic Web enables such that existing UI paradigms do not suffice? This question in itself breaks into two distinct questions as they are in some sense orthogonal:

We?ve suggested that Big Fat Graphs are not appropriate as a de facto way of presenting the Semantic Web because the tasks they support are limited. This limitation is not in itself a bad thing---every UI is limited. Rather, our point is that there is not a strong match between what a BFG of RDF nodes provides and the kinds of information support people who use the Web have come to expect: try doing email or buying a book with a BFG. There is even some resistance within the RDF community itself towards graph-based ontology editing tools ?-- that they are cool to look at but not great to use. This apparently was one of the motivations behind a non-graph RDF editor, SWOOP.


So, the question, to repurpose Freud somewhat, may be not "why are BFGs so poor for the Semantic Web," but ?what does a (Semantic Web) user want??


Another way of putting the question of what do we as Semantic Web users want may be: ?what are we trying to do?? Ben Shneiderman, HCI guru at the U of Maryland, has more recently been framing the question as ?what do you want to know?? Effectively, Shneiderman has said forget trying to show everything since we can never see all of everything at once anyway, and focus on the kinds of things that are of interest to the explorer. Much of Shneiderman?s work with his students, from Spotfire to the more current hierarchical clustering, has indeed focused on enabling researchers to focus on the kinds of questions of interest to them ? such as being able to look at the results of a variety of functions when applied to sets of data ? thus being able to see for instance in what conditions are their outliers; in which matches to particular patterns.

The advantage of keeping the question as ?what do we want to do? rather than ?what do we want to know,? however, may more explicitly capture one particular attribute of the Semantic Web which it has in common with many Web 2.0 applications: the desire to DO something on the Web with the data itself. To tag it, edit it, share it; to push it into new and or other representations. These attributes of edit/tag/share are possible with Web 2.0 applications, which break one part of pre-Web-2.0 models, where the web is interactively read only. The specific affordances and constraints, to use Don Norman?s terms, of the Semantic Web may take us beyond even these relatively new ways of interacting with information on the Web.

4.1 If it ain't broke...

One possible future for semantic web user interfaces is that they will look exactly the same as the traditional (not necessarily Web-based) UIs they are replacing.  After all, we have argued that end users don't care about how the computer stores the data. To the (limited) extent that we have worked out the "right" user interfaces for doing specific information management tasks such as checking email, managing appointments, finding an airline ticket, and so on, we should continue to use those same "right" interfaces when the data they are managing is semantic web data. 


200608031436 A case in point is the Aktive Futures system, designed for the task of exploring information about world oil production.  AKTive Futures uses a Cartesian mapping---a graph, but not in our sense---as one facet of its interface presentation. The core interaction of the UI is to select countries for one axis and ranges of years for the other to look at trends in oil production in those places and times. By clicking on a spot on a line on the graph, the stories that are associated with those confluences are presented in a secondary window. In this case, the use of a particular kind of (Cartesian) graph is appropriate for the task the designers of the application wish to support. Date and output data are, as numeric data, represented in a numerically relevant fashion ? not as static tables but on a Cartesian graph where, in Shneiderman?s parlance, the person using the service is not presented with all data for all time, but is enabled to select the ranges of interest and focus on them with an appropriate format.


So where's the semantic web in this application?  Not in the UI, but in the data.  For Aktive Futures, data is coming from all over the Web and converted where it doesn?t already exist in SW format into SW format (i.e. RDF most usually) so that it can be rendered appropriately for this kind of explorable user interface (UI).  Indeed, the graph is used to help find trends of interest (not unlike Spotfire) and to use those relations of interest as the way to find the richly associated information (such as articles on oil production related to that moment on the graph) to tease out what may have caused a particular spike on the graph.  The role of the semantic web?  Simply to be a standard data representation against which it is easy to write the user interface elements.


200608031435A similar philisophy underpins Haystack, a tool focused on supporting flexible personal information management.  Haystack looks like the majority of applications, with its canvas divided into rectangular regions, each containing some relevant information object or objects.   Objects are shown using traditional presentations---a "collection view" that shows a little bit of information about each objects in the collection, in tiles or tabular form, and more detailed views that show more information about individual objects.  Specific layouts of properties of an object make it easier for a user to quickly absorb them, even without labels---e.g., the summary of an email message shows the sender followed by the subject and date.  Operations on the objects are accessed by traditional context menus.  Additional arguments to those operations are collected via dialog boxes.  Drag and drop is used to place objects in collections, and more generally to record relations between objects.    As with Aktive Futures, the difference is under the covers---all the information Haystack is presenting comes out of its RDF datastore.   And this does enable new functionalities which we will discuss below.  But when the task is a traditional one, such as dealing with email, Haystack consiously replicates the traditional interfaces for tackling that task. 


As these examples show, there is no need to change the interface just because the data is RDF.  Rather, changes to the interface for semantic web applications only makes sense if the semantic web actually enables new kinds of tasks that need new kinds of interface support.

4.2 New Opportunities for UIs

If our applications looks exactly like they did before, it isn't clear what benefit we get from the Semantic Web back-end.  But the Semantic Web does offer new solutions to information management tasks.  The representation of all data in a common form will make it easier to locate, collect, and repurpose data that was previously locked up in separate applications and web sites.  How might users take advantage of this opportunity, with the right UIs?


One such kind of task is enabling rich exploration of a domain across a variety of sources from a user-determined perspective.  This is a new kind of exploration of information spaces. Rather than asking for a specific instance of something --- like a phone number for someone in a keyword search --- exploratory search (as described by Marchionini in ACM's special issue on the topic April 06 ) means that we can come to a domain and explore/compare the relationships in that domain to gain new understanding. An example of this kind of user-determined exploration across heterogeneous sources is supported by the Classical Music Explorer (paper)  mSpace (demo)200608031437 While the mSpace framework is general, in this case, the Explorer is designed to help make classical music discoverable for people who know nothing about classical music.


This discoverability is enabled by a variety of functions: dimensions of information, such as composer, arrangement, period, are represented by columns in a view. Columns act as filters: a selection on the left acts as a filter on what appears in the right. Information about these dimensions comes from a variety of sources agolomorated against a model of the domain. A selection in any dimension brings up descriptions of that instance. A person can adjust these dimensions to suit their interest. Someone who just wants to find piano music can arrange the columns so that Arrangement is the first column. Most critically, text does very little for helping people without musicological knowledge assess music content. Thus, music as information/exploration criteria is made available at each stage of the exploration. Something we call a Preview Cue is associated with each instance in a column. The cue provides rapid access to samples of music in that area of the domain like sonatas or baroque so that users can make assessments of their search interests quickly and easily. There is an Interests area where things of interest can be saved for reference. That the Semantic Web powers this service is invisible to the user. Where this example goes beyond on-line music explorers and into what makes the Semantic Web interesting is that the browser automatically associates information from different sources about the music in the explorer with the music ? choosing ?period: baroque? yields a description of that selection.  Critically, the UI makes it easy for people to see and explore the relationships of one part of a domain with another. This is an alternative form of search from the Web familiar keyword search.


The mSpace explorer in general uses a multicolumn layout similar to that found in Next, OSX and, especially apropos for music exploration, the iTunes layout. It enhances this approach to enable improvement of previously doable but difficult or cumbersome tasks, and making it tractable to draw together new and related resources as they become available. The Semantic Web again makes the integration of multiple domain sources tractable. It becomes possible to ask "which artists are from New Orleans"  - not a query that one would attempt in iTunes. Indeed, pulling together that information on location currently is tedious with current keyword search technology. In the Semantic Web space, it becomes feasible; tractable. In mSpace, the UI enables people to select dimensions of interest (recordings rather than history) in a domain  and to reorganize them. The capability of repositioning information (what mspace calls rearranging a slice) is a powerful tool for comparative analysis of relations within a space. This feature in itself is not an inherent aspect of the semantic web but the semantic web's protocols for making the heterogeneous data available in ways to enable this kind of drawing of sources together means that the information can be compellingly rich: a variety of sources can contribute to a view: previews may come from multiple sources; domain dimensions may be individual sources or aglomerated from multiple sources.   Because of the use of heterogeneous sources, new dimensions can be added to the domain as they become known; musicological data may be supplemented with technical recording data or historical data. Currently, this domain growth happens on the system side, discovering new information; work is under way to make new dimensions discoverable/addable by people exploring data resources themselves. The UI makes it possible (to use spreadsheet language) to pivot from one domain to another on a related term ? so one moves from Beethoven in the context of music to Beethoven in the context of history. Sure yes one can do these pivots with databases and spreadsheets. Indeed, George Roberston?s Polyarchy work called ?Visual Pivot? (pdf) in fact has shown exactly such pivoting in very interesting ways from one database table to another. One may suggest, however, that the Semantic Web has the potential to break from database scale to greater, messier, heterogeneous Web scale. Indeed, the generalized nature of the mSpace framework means that it can be applied to any domain (there are a sample of such domains on the mSpace projects site ).


Within the realm of task management, we can see similar advantages that the semantic web affords for improving/evolving work. Many information management tasks require us to gather information from several heterogeneous domains, manupulate it, and record it for later use.  The traditional partition of information over multiple incompatible applications (and more recently, web sites) placed substantial cognitive load on the end user, to keep track of their place among all these tools, and manually shift information between them.  The semantic web holds out the possibility of aggregating all this necessary information into a single, task-appropriate application.  For example, the "Inbox" in the Haystack email client discussed above contains incoming stories from RSS news feeds, and can also include any other items---such as appointments or todo items---that the user wants to be reminded about as they skim their incoming mail, while email messages can be moved from the Inbox into the user's calendar as an easy way to schedule the time they want to deal with that message.


As a richer example, consider the work of Bellotti et al. on Taskmaster.   They found many people trying to twist their email clients into task management software---keeping mail around to remind themselves of the state and their commitments for uncompleted projects, even going so far as to send email to themselves as reminders about their own tasks.  Bellotti et al. extended a standard email client to incorporate data types (projects, deadlines, events, and so on) and UI elements appropriate for this information management activity, and test subjects loved it.  By taking information previously managed by multiple, non-coordinated applications, and fitting it into a single domain-specific application, Bellotti et al. made it easier for users to do their job.  But doing so took a team of programmers to translate among the data types stored by the relevant applications, and to modify the UI to present those new data types.  If all our data gets unified on the semantic web, repurposing our applications for new tasks like this promises to get a lot easier.  Indeed, do we need programmers at all?   It should become easy for an end user to say "I wish I could drag my email messages into my calendar, so I could establish appropriate times for answering them." 


Haystack explores this idea of application development by end users.  Haystack lets end users choose the information objects from their RDF store (or from other Semantic Web repositories) that matter for their task, decide how each should look (which properties of the objects matter, and how they should be formatted and laid out), and lay them out on a canvas.  Once they have done so, they have access (as discussed above) to traditional application behaviors like context menus and drag-and-drop for manipulating their information; changes get fed back into the RDF store.  The figure to the right shows such an application, purpose-built for the task of writing a particular neuroscience research paper.   The data for the paper, related publications, coauthors, to-do items for the task, and relevant email messages are all aggregated into a single task-specific workspace.


Such end-user application development would be impossible in a traditional environment: merging the data, stored in multiple, incompatible formats, requires computational solutions beyond the capabilities of end users.  But once the data is in a single format like RDF, the task of selecting, formating, and laying out items becomes more like word processing and window management---well within the capabilities of many users. 


One of the interesting features of the Semantic Web that is not harnessed by simple BFGs of RDF and that makes both Haystack and mSpace possible is the fact that Semantic Web data makes it possible to break the paradigm of the page (called for in You?ve Got Hyperext). In the current Web, the smallest meaningful unit of information is generally the Web page. The page means that data served to a page is largely in a fixed pre-delivered format. The paradigm of the Web has been to support this model of page-based presentation. With the Semantic Web, data is captured in smaller units AND is made available via RDF stores to be repurposed.  The Piggy Bank (web browser plugin) and Semantic Bank (server) highlight this capability, letting users extract individual structured items from web pages as they browse, and store and annotate them in a shared repository for access by others.


Such fine-grained data access actually enables people to choose a variety of representations for the information out there, depending again on what they want to do with it. This multiple different views over same data is also technically possible with any database-backed web site, though we cannot find an example that really demonstrates it. Some Web 2.0 applications have started to push against the page paradigm. Google Maps Mash ups are perhaps the best known example of this where location data is merged with other public data sources. Such mash ups are promising but limited against what the vast interconnectivity of the semantic web can afford, but they do make the case for the value of being able to repurpose data. Such applications go beyond the ideology of databases which has been to maintain and almost protect the data source; the Semantic Web?s is to publish and share, thus making repurposing and likely therefore just as readily re-presenting the discovered data a raison d?etre of the paradigm. 

Likewise, the immediate possibilities of how one set of data might be repurposed with another set of data automatically is also a remarkable and still largely untapped affordance (and possibly the central opportunity) of the Semantic web. This capacity is enabled by that same RDF that wraps up and makes communicable the semantics of the data in relation to itself and to other data. Just as the schema of a database makes visualizations like Spotfire possible, the RDF of the semantic web will make richer mechanisms for engaging with data possible.


We see some of this page-breaking, cross-web, context sensitive flexible repurposing of data in Semantic Web Applications like Haystack, piggy bank, AKTive Futures and /facet (pronounced ?slash facet?), and from Semantic Web/Web 2.0 hybrid applications like mSpace and mSpace mobile.


4.3 UIs for New Opportunities

If new UIs are to help people take advantage of the seamless integration of data offered by the semantic web, we are going to have to deal with a number of challenges posed by that integration. 

One challenge is the late filling-in of the data model.  Traditionally, we build applications by first designing the data model, and then developing an interface for interacting with that model.  With the semantic web, one function this approach will enable us to support is to let users decide to incorporate any type of data into some kind of working space as they decide it is relevant to their task.  How can UI designers predict this and make a UI that will visualize that arbitrary data the right way?  The likely answer is, we can't.  Instead, the user who is pulling in the data will also have to think about (or go find, somewhere on the semantic web) ways to present that data as well.   In this vein, the Fresnel project has developed ?a generic ontology for describing how to render RDF in a human-friendly manner.?  The idea is to write lenses that describe, in RDF, which properties of a given type of object "matter" in presenting a given object.  Someone encountering a new type of data, or trying to incorporate it into one of their applications, can seek out (or create) and plug in lenses that will embedd visualizations of the new data type in the old application.

This capacity to pull in components on demand introduces a concept of applications for application building. Given descriptions of the individual object views, therfore, who is going to assemble applications from them, and how?  A big bucket of view widgets is not necesarily a good starting point for an end user trying to build up such an application, especially if that person is not entirely familiar with the information domain with which they wish to interact.  So what will application-building applications look like? How will these dynamic explorers interact with prefabricated semantic web applications?

Another challenge that semantic web UIs will have to face is presenting provenance.  Right now, when we navigate the web, provenance is pretty easy to understand: we look at an address bar, and that tells us which web site stands behind the content we are viewing.  But in the semantic web vision, we will be collecting small data fragments from all over the place.  One approach, in CSAKTiveSpace, has been an "under the hood" view of data provenance , where the provenance of any data instance in the interface can be viewed directly, but only be "popping the hood" and switching views from the current explortation to the provenance view. So how do we keep track of what came from where in persistent view, and do we want to?  Do we color data green from our own repositories, and red from suspected spammers?  What about data that is not copied wholesale, but instead inferred by reasoning from a collection of other data items, themselves collected from multiple locations? 

Semantic web UIs will also be needed to help users cope with one of the unavoidable challenges of the semantic web: ontology alignment.  As multiple individuals publish information, they are likely to use different terms to talk about the same things.  Someone collecting this mismatched data will be able to make best use of it only if they can somehow merge the different vocabularies into their own, or potentially select which versions align with them.  Even before merging, therefore, they will likely need some way to inspect the ontologies, in order to understand what different terms mean.  Ontology browsing and alignment is one of the popular motivations for graph-based visualizations, but there is no evidence that such UIs are the best ones for the task.  And there are alternatives---for example, a quick glance at the Thunderbird address book shows a primitive "ontology aligment" UI for importing a comma-separated-values file into an address book.   The user checks off properties in the "destination" ontology, and moves exemplar values from the "source" ontology up and down until corresponding properties are lined up the right way.   Again, no graph is in sight, and it is not clear how one would help.

An even harder case of ontology alignment arises before the user gets the data, while they are still looking for it.  Traditionally, data was locked up in applications or web sites that had specific models, and could therefore offer domain specific search tools for locating information the held.  Once ontologies proliferate on the semantic web, what UIs can help a user formulate a query for information against ontologies they might not even know?  Even on the semantic web, text search is likely to play an important role, because its fuzzy semantics means that users don't have to describe their information exactly the right way.  This might let them find information without ever figuring out the ontology they are using.  It will also be important to provide such fuzzy query semantics on non-textual Semantic Web data---a preliminary effort in this direction is Magnet.

5. Dynamic, Free Form Semantic Web UI aps

The opportunity that both Haystack and mSpace demonstrate is that they can unify any and all data into the user interface. Thus, they can bring together all and only the information that a user needs to tackle a particular task, without worrying about which application owns it or how it will be represented. The UI challenge within these kinds of applications is that they can unify any and all data into the UI. In other words, application/interaction designers may have no idea what data is going to show up and be necessary for the users? tasks, so there is a challenge to determine how to present data that is not known in advance and therefore cannot be planned for. Haystack and mSpace, each in slightly different ways, are trying to exploit the opportunity/meet the challenge by creating a UI framework that can accept/display arbitrary data, thus offering all the benefits of task-driven data aggregation, while still looking and acting like familiar applications, rather than an exposure of the raw source data. For instance, suppose there is an interest in finding Jazz music and there is no pre-made mSpace Jazz explorer? Or more intriguing yet, suppose someone is interested in not only exploring the sounds of jazz but of seeing what is happening historically both politically and in architecture at the same time as different trends in music are occurring in order to explore the question what was influencing what when?

The above kind of questions means that a person may wish to be able to start exploring from a particular information seed or set of seeds from which to start building and exploring relations (though even how to express these seeds may be challenging ? another matter for interaction research innovation (ever know what you want but not the terms to express it so that you can find it on Google?)). The above mixing query means that samples of music need to be available so someone can: audition the songs (we do not assume the Questor is a jazz expert) to hear what is of interest; engage historical political period data from different regions; enable this data to be contextualized not only by location but by time, and readily explorable by time and by location visualizations. What is the ideal representation for this information as it is assembled? It is NOT  (alone or primarily) a Big Fat Graph.



By way of alternative example,Web Founder and Semantic Web co-founder Tim Berners-Lee has been developing an idea called the Tabulator (see paper this workshop). The goal of the tabulator is to provide a means to move from one RDF source to another for arbitrary exploration of the semantic web. It uses a table view of RDF space rather than a graph. Conceptually, one starts with a specific known source of semantic web data, and then rather than in a graph, one selects cells in a tabular representation of the related RDF which expands into fresh tables, etc. The data collected in these expansions can then be re-visioned into either a map, a calendar or a time line (note the term ?or?). There is considerable potential here.  But currently selecting the source of the data isvery geeky; results data is also expressed in rdf-eese triples like ?colorPicture is mentioned in TAGmobile road trip BOS-> Amerst:photo."


Could this geek design be because the developers started with themselves as candidate users rather than aiming from the start to support average web users who know nothing about RDF? Can someone who knows nothing about the semantic web use the Tabulator (one of the us has yet to be able to get the demo to run and is unclear as to whether this is because of some mis-chosen input, some persmissions on the demonstrator side or some other problem)? This application is a great example of where the user interaction design/research community is needed to help semantic web experts/researchers communicate the capacity of the Semantic Web, as Apple used to put it, for the rest of us.

5.1 Mix and Match on the Fly

Some of the challenges for Semantic Web UI services, besides de-geeking things like Tabulator, will be to support data in formats that give the application information that is relevant to what display options may be appropriate for it (dates, map coordinates, contacts). It is not clear what the solution is: micro formats is one approach; fresnel, as described above, is another. It will be interesting to see how these approaches work across heterogeneous data sources and distinct contexts. It will also mean being able to add new data/links/tags.

That observation of the context in which the data is discovered leads back to the earlier observation that UIs for semantic web data, like all other human-usable systems, need to respect and support what the human wants to do with that data. Applications or higher level frameworks which start with this focus in mind have a greater likelihood of producing effective interactions. Being able to establish context for multiple intersecting data domains and data types may be as critical as being able to take advantage of a pre-asserted format for a particular data chunk. For example, consider the Web 2.0 application, Live Clipboard that enables sharing of Web content between Web pages (see: technical information on how it does that). With Semantic Web, we may want to be able to pull together a variety of sources of data and data formats together for concurrent representation in appropriate models ? data copied from various sources pasted into new representations to be used in new ways. How would a Semantic Web version of something like Live Clipboard work to deliver a dynamic version of AKTive futures?

6. Conclusions/Looking Forward


The bottom line in our argument is that Big Fat Graphs have their place, but overall, it?s a fairly limited place. One of the dangers of BFGs is that they are generally pretty easy to deploy. The algorithms for pumping data into many graphs are well known. They are the easy hammer to reach for when a semantic web researcher wants to deploy a visualization of their data. As we hope we have begun to demonstrate, it is a pathetic fallacy to assert that because the data model is a graph the data should therefore be displayed as a graph. Using BFGs introduce their own problems without necessarily solving any a user may have when engaging with the information so presented. As we have argued, graphs have limited value, even for many of the tasks which they are supposed to support. The harder question coming from this interrogation is ?how do we elegantly support the range of possible interactions both in pre-defined Semantic Web applications and in dynamic explorations of Semantic Web resources?? We have only sketched out some examples of current SW applications to support old tasks better and in new ways enabled by the Semantic Web, and to explore more dynamically SW-RDF resources for user-determined exploration. As is evident, much more innovative work is possible and needs to be done.

People at the coal face of RDF and Ontology work may not see it as their mission to consider that more human-oriented approach to representing information spaces for human usable, human-useful exploration. But why not? If we are presenting our tools or data for other people to use, is it not part of best practice in the engineering process to consider the best methods to support the presentation of / interaction with those services?  The result may well be the generation of Semantic Web browsers/tools/applications ? to enable people both to explore and contribute to the rich associations possible in the ((increasingly Social and) Semantic) Web.