Masters Thesis - Issues of Saliency and Recognition in the Search for Web Page Bookmarks
1. Introduction
1.1) Keeping found things found
Since its inception 12 years ago, the World Wide Web has experienced a phenomenal growth rate far beyond that of any other comparable media. Accurate estimates on the size of the web are hard to acquire, but the largest search engine, Google, claims to hold 3 billion web pages in its database ('Benefits of Google'), but this can only be a small fraction of the total number of web pages currently in existence (Lawrence & Giles, 1999).
Although the Web serves as the primary information resource for many people, its massively increasing size and complexity has made 'information overload' one of the biggest and most obvious drawbacks of the technological age. Thankfully, in recent years finding resources on the web has been made easier with modern search engines such as Google, together with more refined search functions found within websites themselves. But managing to successfully find a web page invites a secondary problem - how do you keep it 'found'? (Jones, Bruce & Dumais, 2001).
Users have many different methods of 'keeping' resources found on the web. They save whole pages to their hard drives, print them out, send URLs to themselves in an email, write them down on a piece of paper or add them to the "bookmarks" list in their web browser (Jones, Bruce & Dumais, 2001; Cockburn & McKenzie, 2000; Tauscher & Greenberg, 1997). The last method, 'bookmarking', will be the focus of this study.
1.2) Bookmark basics
Bookmarks have been in existence since the creation of the first World Wide Web browser in 1991 (Cailliau, 2002), and have been adopted by most web browsers as a standard navigation and revisitation tool, but referred to by different names for reasons of marketing. The term 'bookmark' is used in the Netscape Navigator browser, the equivalent term being 'favorites' in Internet Explorer, as shown in Figure 1 below (The term bookmark will be used throughout this paper and is synonymous with 'favorites' and 'links').
Figure 1
'bookmarks' and 'favorites' menus.
The text in a bookmark begins its life as the title of a web page, found in the <title> tag in the html code that is used to build the page (Figure 2).
Figure 2
The title text comes from the <title> tag as defined in the HTML code.
The content of this tag is extracted and used for various functions in Microsoft Windows. It is used firstly as the title of the page in the top bar of the web browser and for the icon representing the browser on the Start bar when the browser is minimised (Figure 3). If the user decides to save the web page to their hard drive, the title is used as the filename. The <title> text also appears in the 'History' list of the web browser and of course in the bookmark if the user decides to keep the page by that method. Finally, the text also appears in the tool tip that pops up when the mouse pointer is held over the bookmark in Internet Explorer (Figure 3).
Figure 3
The <title> tag text is extracted and used for various functions in Windows: It appears in the browser's top bar, as the 'Start' bar icon and in the history list. It is also used for the bookmark and the corresponding the tool tip, and to name a saved file.
It is important to note that the text in the <title> tag does not actually appear on the web page itself, and is not necessarily the same as the 'title' appearing within the web page, which has to be defined separately by the author (Figure 4).
Figure 4
An example of when the title text does not match the 'real' title of the page.
1.3) Good housekeeping
There are a few basic things a web author should do in order to write an acceptable bookmark, based on the complaints of web users (Kassten, Greenberg, & Edwards, 2002; Cockburn, Greenberg, Jones, McKenzie & Moyle, 2003).
First, they must remember to actually define the <title> tag. If the <title> tag is empty or even missing from the HTML code, the filename and directory path of the page will be shown instead of a meaningful title, for example http://www.hppmusicindex.com/out.asp or http://thezfiles.co.uk/seek_ae5663dc.htm.
If the author is using web publishing software such as Macromedia Dreamweaver, the programme's default text will be displayed if they don't define the <title>. This can be recognised frequently on the Web by pages marked "Untitled".
The author must ensure that the <title> tag and the 'title' within the page actually match. Differences between the two have been cited by users as a major annoyance when trying to locate a bookmark (Kassten, et al., 2002). Also, authors should ensure that each page on their website has a unique title to aid multiple bookmarking of pages from the same site.
Lastly, the author has to make the title fit within the bookmark character length limit. In Windows, the maximum length for a bookmark is 255 characters (including spaces), but only the first 65 characters on average will be visible in the 'favorites' menu in Internet Explorer, although all 255 characters should appear in the tool tip (see Figure 5 below). Only average capacity can be given as the amount of words visible will depend on the thickness of the letters used (if Windows used a monospaced font for menus, the character limit would be identical each time).
Figure 5
On the 'favorites' menu in Internet Explorer, the tool tip displays 255 characters while the bookmark only displays 65 characters on average.
