Understanding the World Wide Web

The World Wide Web is a system of Internet servers that supports hypertext to access several Internet protocols on a single

interface. The World Wide Web is often abbreviated as the Web, WWW, or W3.

The World Wide Web was developed in 1989 by Tim Berners-Lee of the European Particle Physics Lab (CERN) in Switzerland. The initial purpose of the Web was to use networked hypertext to facilitate communication among its members, who were located in several countries. Word was soon spread beyond CERN, and a rapid growth in the number of both developers and users ensued. In addition to hypertext, the Web began to incorporate graphics, video, and sound. Over the past few years, the use of the Web has reached global proportions.

Almost every protocol type available on the Internet is accessible on the Web. Internet protocols are sets of rules that allow for intermachine communication on the Internet. The following major protocols are accessible on the Web:

E-mail (Simple Mail Transport Protocol or SMTP)

Distributes electronic messages and files to one or more electronic mailboxes

Telnet (Telnet Protocol)

Facilitates login to a computer host to execute commands

FTP (File Transfer Protocol)

Transfers text or binary files among computer hosts

Usenet (Network News Transfer Protocol or NNTP)

Distributes Usenet news articles derived from topical discussions on newsgroups

Gopher (Gopher Protocol)

Displays information on a system of menus and documents (mostly obsolete)

HTTP (HyperText Transfer Protocol)

Transmits hyptertext over networks. This is the protocol of the WWW.

The World Wide Web provides a single interface for accessing all these protocols. This creates a convenient and user-friendly environment. It is no longer necessary to be conversant in these protocols within separate, command-level environments. The Web gathers together these protocols into a single system. Because of this feature, and because of the Web's ability to work with multimedia and advanced programming languages, the World Wide Web is the fastest-growing component of the Internet.


HYPERTEXT: THE MOTION OF THE WEB

The operation of the Web relies primarily on hypertext as its means of information retrieval. HyperText is a document containing words that connect to other documents. These words are called links and are selectable by the user. A single hypertext document can contain links to many documents. In the context of the Web, words or graphics may serve as links to other documents, images, video, and sound. Links may or may not follow a logical path, as each connection is programmed by the creator of the source document. Overall, the WWW contains a complex virtual web of connections among a vast number of documents, graphics, videos, and sounds.

Producing hypertext for the Web is accomplished by creating documents with a language called HyperText Markup Language, or HTML. With HTML, tags are placed within the text to accomplish document formatting, visual features such as font size, italics and bold, and the creation of hypertext links. Graphics may also be incorporated into an HTML document. HTML is an evolving language, with new tags being added as each upgrade of the language is developed and released. The World Wide Web Consortium, led by Tim Berners-Lee, coordinates the efforts of standardizing HTML.


HOME PAGES ON THE WEB

The World Wide Web consists of files, called pages or home pages, containing links to resources throughout the Internet.

Web pages can be created by user activity. For example, if you visit an Internet search engine and enter keywords on the topic of your choice, a page will be created containing the results of your search.

Access to Web pages may be accomplished by:

Entering an Internet address and retrieving a page directly

Browsing through pages and selecting links to move from one page to another

Searching through subject directories linked to organized collections of Web pages

Entering a search statement at a search engine to retrieve pages on the topic of your choice


RETRIEVING DOCUMENTS ON THE WEB: THE URL

URL stands for Uniform Resource Locator. The URL specifies the Internet address of a file stored on a host computer connected to the Internet. Every file on the Internet, no matter what its access protocol, has a unique URL. Web software programs use the URL to retrieve the file from the host computer and the directory in which it resides. This file is then displayed on the user's computer monitor.

URLs are translated into numeric addresses using the Internet Domain Name System (DNS). The numeric address is actually the "real" URL. Since numeric strings are difficult for humans to use, alphneumeric addresses are employed by end users. Once the translation is made, the Web server can send the requested page to the user's Web browser.


Anatomy of a URL

This is the format of the URL:

protocol://host/path/filename

For example, this is a URL on the home page of the U.S. Census Bureau

http://www.census.gov/pubinfo/www/news.html

This URL is typical of addresses hosted in domains in the United States.Structure of this URL:

Protocol: http

Host computer name: www

Second-level domain name: census

Top-level domain name: gov

Directory name: pubinfo

Directory name: www

File name: news.html

Other examples:

telnet://library.albany.edu the University at Albany library catalog
ftp://bongo.cc.utexas.edu/microlib a file at an ftp site

Several top-level domains (TLDs) are common in the United States:

com commercial enterprise
edu educational institution
gov U.S. government entity
mil U.S. military entity
net network access provder
org usually nonprofit organizations

In addition, dozens of domain names have been assigned to identify and locate files stored on host computers in countries around the world. These are referred to as two-letter Internet country codes, and have been standardized by the International Standards Organization as ISO 3166. For example:

ch Switzerland
de Germany
jp Japan
uk United Kingdom

It had been proposed that new top-level domains be added to the existing domain names. However, plans came to a halt after a U.S. government report was issued in late January, 1998. The government suggests the creation of up to five new TLDs.

An international consortium had originally proposed adding seven new domain names, as follows:

firm businesses or firms
store businesses offering goods to purchase
web entities emphasizing activities related to the WWW
arts ntities emphasizing cultural and entertainment activities
rec entities emphasizing recreation/entertainment activities
info entities providing information services
nom those wishing individual or personal nomenclature

The future of the domain name system continues under active discussion as the U.S. Government takes action to privatize this function.


HOW TO ACCESS THE WORLD WIDE WEB: WEB BROWSERS

To access the World Wide Web, you must use a Web browser. A browser is a software program that allows users to access and navigate the World Wide Web. There are two types of browsers:

Graphical: Text, images, audio, and video are retrievable through a graphical software program such as Netscape Navigator and Internet Explorer. These browsers are available for both Windows-based and Macintosh computers. Navigation is accomplished by pointing and clicking with a mouse on highlighted words and graphics. The current version of Navigator is contained within a suite of programs called Netscape Communicator.

You can install a graphical browser such as Netscape Navigator in your Windows-based or Macintosh machine. Navigator is available for downloading on the Netscape home page: http://home.netscape.com. To use the program to access the Web, you need an ethernet connection or a dialup connection known as a SLPP or PPP. The latter may be obtained from an Internet Service Provider. For more information, see How to Connect to the Internet.

Text: Lynx is a browser that provides access to the Web in text-only mode. Navigation is accomplished by highlighting emphasized words in the screen with the arrow up and down keys, and then pressing the forward arrow (or Enter) key to follow the link. This browser is available through your personal IBM, VAX, or UNIX account on campus. For more information, see Guide to Using Lynx.


Extending the Browser: Helper Applications and Plug-Ins

Software programs may be configured to a Web browser in order to enhance its capabilities. When the browser encounters a sound, image or video file, it hands off the data to other programs, called helper applications, to run or display the file. Working in conjunction with helper applications, browsers can offer a seamless multimedia experience. Many helper applications are available for free.

File formats requiring helper applications are known as MIME types. MIME stands for Multimedia Internet Mail Extension, and was originally developed to help e-mail software handle a variety of binary (non-ASCII) file attachments. The use of MIME has expanded to the Web. For example, the basic MIME type handled by Web browsers is text/html associated with the file extention .html.

A common helper application utilized on the Web is the Adobe Acrobat Reader. The Acrobat Reader allows you to view documents created in Adobe's Portable Document Format. These documents are the MIME type application/pdf and are associated with the file extension .pdf. When the Acrobat Reader has been configured to your browser, the program will open and display the file requested when you click on a hyperlinked file name with the suffix .pdf.

Plug-ins are software programs that extend the capabilities of a Web browser in a specific way, such as the ability to play audio files or view video movies from within Navigator. Web browsers are often standardized with a small suite of plug-ins. Additional plug-ins may be obtained at the browser's Web site, at special download sites on the Web, or from the home pages of the companies that created the programs. The number of available plug-ins is increasing rapidly. For example, nearly 200 plug-ins are available for downloading at the Netscape site.

Once a plug-in is configured to your browser, it will automatically launch when you choose to access a file type that it uses.

Netscape Communicator can be downloaded with a variety of helper applications and plug-ins configured to the browser, including:

_Cosmo Player to view 3D sites created with Virtual Reality Modeling Language (VRML) (file suffixes .wrl, .wrz)

_Netscape Media Player for streaming audio metafiles (file suffix .lam)

_Live Audio for sound files (file suffixes .au, .aiff, .wav, .midi, .la, .lma)

_QuickTime Player for video (file suffix .mov)

_NPAI32 Dynamic Link Library for video in Windows (file suffix .avi)


Beyond Plug-Ins: Active X

ActiveX is a technology developed by Microsoft which may make plug-ins less neccesary. ActiveX offers the opportunity to embed animated objects, data, and computer code on Web pages. A web browser supporting ActiveX can render most items encountered on a Web page. For example, Active X allows users to view three-dimensional VRML worlds in a Web browser without the use of a VRML plug-in. As another example of the power of ActiveX, this technology can allow you to view and edit PowerPoint presentations directly within your Web browser. ActiveX is supported by the Internet Explorer and Netscape Navigator 4.x browsers.


THE EXPERIENCE OF THE WEB

Today's World Wide Web presents an ever-diversified experience of multimedia, programming languages, and real-time communication. There is no question that it is a challenge to keep up with the rapid pace of developments. The following presents a brief description of some of the more important trends to watch.

Multimedia

The Web has become a broadcast medium. It is possible to listen to audio and video over the Web, both pre-recorded and live. For example, you can visit the sites of various news organizations and view the same videos shown on the nightly television news. Several plug-ins are available for viewing these videos. For example, Apple's Quick Time Player downloads files with the .mov extension and displays these as "movies" in a small window on your computer screen. Quick Time files can be quite large, and it may take patience to wait for the entire movie to download into your computer before you can view it.

The problem of slow download times has been answered by a revolutionary development in multimedia capability: streaming data. In this case, audio or video files are played as they are downloading, or streaming, into your computer. Only a small wait, called buffering, is necessary before the file begins to play. The RealPlayer plug-in plays streaming audio and video files. Extensive files such as interviews, speeches and hearings work very well with the RealPlayer. The RealPlayer is also ideal for the broadcast of real-time events. These may include press conferences, live radio and television broadcasts, concerts, etc. A list of sites that make use of the RealPlayer is available at http://www.albany.edu/library/internet/net_info/realaudio.html. The Windows Media Player is another streaming media player. A list of sites that make use of this player is available at http://wmg.netcastnetwork.com/. Many sites offer the option to use one player or the other.

Shockwave presents another multimedia experience. Shockwave allows for the creation and implementation of an entire multimedia display combining graphics, animation and sound.

Sound files, including music, may also be heard on the Web. It is not uncommon to visit a Web page and hear background music. Sound files are also available for downloading independent of Web page visits. Sound files of many types are supported by the Web with the appropriate helper applications.

Live cams are another aspect of the multimedia experience available on the Web. Live cams are video cameras that send their data in real time to a Web server. These cams may appear in all kinds of locations, both serious and whimsical: an office, on top of a building, a scenic locale, a special event, and so on.


Programming Languages and Functions

The use of existing and new programming languages have extended the capabilities of the Web. Many of the newer languages are in flux, and will experience major changes in the coming months.

What follows is a basic guide to a group of the more common languages and functions in use on the Web today.

CGI, Active Server Pages: CGI (Common Gateway Interface) refers to a specification by which programs can communicate with a Web server. A CGI program, or script, is any program designed to accept and return data that conforms to the CGI specification. The program can be written in any programming language, including C, Perl, and Visual Basic. A common use for a CGI script is to process an interactive form on a Web page. For example, you might fill out a form ordering a book through Interlibrary Loan. The script processes your information and sends it to a designated e-mail address in the Interlibrary Loan department.

A newer type of dynamically generated Web page is called Active Server Pages (ASP). Developed by Microsoft, ASPs are HTML pages that include scripting and create interactive Web server applications. The scripts run on the server rather than on the Web browser to generate the HTML pages sent to browsers. Visual Basic and JScript (a subset of JavaScript) are often used for the scripting. ASPs work with only Microsoft's Web servers.

Java/Java Applets: Java is probably the most famous of the new programming languages of the Web. Java is an object-oriented programming language similar to C++. Developed by Sun Microsystems, the aim of Java is to create programs that will be platform independent. A perfect Java program should work equally well on a PC, Macintosh, Unix, and so on, without any additional programming. This goal has yet to be realized. Java can be used to write applications for both Web and non-Web use.

Web-based Java applications are usually in the form of Java applets. These are small Java programs called from an HTML page that can be downloaded from a Web server and run on a Java-compatible Web browser. A few examples include live newsfeeds, moving images with sound, calculators, charts and spreadsheets, and interactive visual displays. Java applets can tend to load slowly, but programming improvements should lead to a shortened loading time.

JavaScript/JScript: JavaScript is a programming language created by Netscape Communications. Small programs written in this langauge are embedded within an HTML page, or called externally from the page, to enhance the page's the functionality. Examples of JavaScript include moving tickers, drop-down menus, real-time calendars and clocks, and mouse-over interactions. JScript is a similar language developed by Microsoft and works with the company's Internet Explorer browser.

VRML: VRML (Virtual Reality Modeling Language) allows for the creation of three-dimensional worlds. These may be linked from Web pages and displayed with a VRML viewer. Netscape Communicator comes with the Cosmo viewer for experiencing these three-dimensional worlds. One of the most interesting aspects of VRML is the option to "enter" the world and control your movements within the world.

XML: XML (eXtensible Markup Language) is a Web page creation language that enables designers to create their own customized tags to provide functionality not available with HTML. XML is in the process of being reviewed by the World Wide Web Consortium. At present, this language is little used as Web browsers are only beginning to support it. Some predict that XML may even replace HTML someday


Real-Time Communication

Text, audio and video communication can occur in real time on the Web. This capability allows people to conference and collaborate in real time. In general, the faster the Internet connection, the more successful the experience.

At its simplest, chat programs allow multiple users to type to each other in real time. Internet Relay Chat and America Online's Instant Messenger are prime examples of this type of program. The development of a messenging protocols is underway. Such a protocol would allow for the expansion of this capability throughout the Internet.

More enhanced real-time communication offers an audio and/or video component. CU-See Me is one of the most popular sotware programs of this type. Even more elaborate are programs that allow for true real-time collaboration. Microsoft's NetMeeting and Netscape's Conference (available with Communicator) are good examples of this.

Featured collaboration tools include:

_audio: conduct a "telephone" conversation on the Web

_video: view your audience

_file transfer: send files back and forth among participants

_chat: type in real time

_whiteboard: draw, mark up, and save images on a shared window or board

_document/application sharing: view and use a program on another's desktop machine

_collaborative Web browsing: visit Web pages together

Currently no standard exists that will work among all conferencing programs.


Push: Push refers to a technology that sends data to a program without the program's request. This is the opposite of the typical "pull" of the Web, in which the user clicks on a link to request a file from a server. With push, the data is sent automatically. Content is sent through a "channel." The early Web-based implementation of push was commercial. Probably the best known example is PointCast, which sends customized news to users' desktops in the form of a screen saver. Push can also be used to deliver software upgrades to a desktop machine.


BACK TO THE INTERNET SECTION