← X1: Development Basics | ↑ Table of Contents ↑ | X2: Creating a Web Page → |
The Web is more a social creation than a technical one. I designed it for a social effect — to help people work together — and not as a technical toy. The ultimate goal of the Web is to support and improve our weblike existence in the world. We clump into families, associations, and companies. We develop trust across the miles and distrust around the corner. — Tim Berners Lee
If this were a traditional science, Berners-Lee would win a Nobel Prize. What he's done is that significant. — Eric Schmidt
To many people, the World Wide Web is the face of computing. It provides access to vast amounts of information, to companies and their products, and to communication tools that connect people with friends and family around the world. Far from being a static repository of information, the Web is a dynamic, growing and evolving medium through which people can access resources and contribute content. With a little knowledge, anyone can develop Web pages and publish them to the world, becoming Web authors and online influencers. The Web has grown exponentially since its inception in 1990, with the total number of Web pages available today estimated to be in the hundreds of trillions. Of course, that volume of material introduces challenges — how do you find relevant material among all the junk on the Web, and how do you get your voice heard when you publish your ideas or products on the Web?
This chapter provides an overview of the Web, which will be expanded on in future chapters. It begins by distinguishing the Web from the Internet, since the two are so often conflated in people's minds. The history of the Web, in particular the contributions of Tim Berners-Lee in creating and championing it, are discussed, as well as the role that search engines such as Google play in making the Web usable. Finally, the technical aspects of how the Web works are described in easily understandable terms. This includes the roles of Web browsers and servers, as well as the protocols (HTML and HTTP) that make it possible for people to access and contribute to the vast store of resources that is the Web.
Despite a common misconception, the World Wide Web and the Internet are not the same thing. As you will learn in Chapter C3, the Internet is a vast, international network of computers. In the same way that an interstate highway crosses state borders and links cities, the Internet crosses geographic borders and links computers. The physical connections may vary from high-speed dedicated cables (such as cable-modem connections) to slow but inexpensive phone lines, but the effect is that a person sitting at a computer in Omaha, Nebraska is able to share information and communicate with a person in Osaka, Japan (FIGURE 1). The Internet traces its roots back to 1969 when the first long-distance network was established to connect computers at four U.S research universities.
FIGURE 1. Internet users around the world (Ketut Subiyanto/Marcus Aurelius/Ekaterina Bolovtsova/Jopwell/Ekaterina Bolovtsova/Pexels).
Whereas the Internet is made up of hardware (computers and the connections that allow them to communicate), the World Wide Web is a collection of software that spans the Internet and enables the interlinking of documents and resources (FIGURE 2). The basic idea for the Web was proposed in 1989 by Tim Berners-Lee of the European Laboratory for Particle Physics (CERN). To allow distant researchers to share information more easily, Berners-Lee designed a system through which documents — even those containing multimedia elements, such as images and sound clips — could be interlinked over the Internet. Using well-defined rules, or protocols, that define how pages are formatted, documents could be shared across networks on various types of computers, allowing researchers to disseminate their research broadly. With the introduction of easy-to-use graphical browsers for viewing documents in the mid-1990s the Web became accessible to a broader public, resulting in the World Wide Web of today.
FIGURE 2. Internet vs. World Wide Web.
✔ QUICK-CHECK 2.1: True or False? The terms "Internet" and "World Wide Web" refer to the same entity.
Although Internet communications among universities and government organizations were common in the 1970s and 1980s, the Internet did not achieve mainstream popularity until the early 1990s, when the World Wide Web was developed. The Web, a multimedia environment in which documents can be seamlessly linked over the Internet, was the brainchild of Tim Berners-Lee (1955-), pictured in FIGURE 3. During the 1980s, Berners-Lee was a researcher at the European Laboratory for Particle Physics (CERN). Because CERN researchers were located across Europe and used different types of computers and software, they found it difficult to share information effectively. To address this problem, Berners-Lee envisioned a system in which researchers could freely exchange documents, regardless of their locations and the types of computers they used. In 1989, he proposed the basic idea for the Web, suggesting that documents stored on Internet-linked computers could be linked so as to make navigating from one to another simple.
FIGURE 3. Tim Berners-Lee (CERN, 1994).
Although Berners-Lee's vision for the Web was revolutionary, his idea was built upon a long-standing practice of linking related documents to enable easy access. Books containing hypertext (interlinked text and media) have existed for millennia. This hypertext might tie a portion of a document to related annotations, as in the Jewish Talmud (first century B.C.), or might provide links to nonsequential alternate story lines, as in the Indian epic Ramayana (third century B.C.). The concept of an electronic hypertext system was first conceived in 1945, when presidential science advisor Vannevar Bush (1890-1974) envisioned designs for a machine that would store textual and graphical information in such a way that any piece of information could be arbitrarily linked to any other piece. The first small-scale computer hypertext systems were developed in the 1960s and culminated in the popular HyperCard system that shipped with Apple Macintosh computers in the late 1980s.
Berners-Lee was innovative, however, in that he combined the key ideas of hypertext with the distributed nature of the Internet. Documents could be stored on computers across the Internet and logically linked regardless of location. His design for the Web relied on two different types of software running on Internet-connected computers. The first kind of software executes on a Web server, a computer that stores documents and "serves" them to other computers that want access. The second kind, called a Web browser, allows users to request and view the documents stored on servers. Using Berners-Lee's system, a person running a Web browser could quickly access and jump between documents, even if the servers storing those documents were thousands of miles apart. Web pages could be linked to other pages based on similar content, regardless of where those linked pages were stored.
In 1990, Berners-Lee produced working prototypes of a Web server and browser. His browser was limited by today's standards, in that it was text based and offered only limited support for images and other media. This early version of the Web acquired a small but enthusiastic following when Berners-Lee made it freely available over the Internet in 1991.
✔ QUICK-CHECK 2.2: True or False? The original vision of the Web, as well as the first Web browser and server software, are credited to Tim Berners-Lee.
✔ QUICK-CHECK 2.3: True or False? A Web server is a piece of software that requests and displays Web pages.
The Web might have remained an obscure research tool if others had not expanded on Berners-Lee's creation, developing browsers designed to accommodate the average computer user. In 1993, Marc Andreesen (1971-) and Eric Bina (1964-), of the University of Illinois's National Center for Supercomputing Association (NCSA), wrote the first graphical browser, which they called Mosaic. Mosaic employed buttons and clickable links as navigational aids, making the Web easier to traverse. The browser also supported the integration of images and media within pages, which enabled developers to create more visually appealing Web documents. The response to Mosaic's release in 1993 was overwhelming. As more and more people learned how easy it was to store and access information using the Web, the number of Web servers on the Internet grew from 50 in 1992 to 3,000 in 1994. In 1994, Andreesen left NCSA to found the Netscape Communications Corporation, which marketed an extension of the Mosaic browser called Netscape Navigator. Originally, Netscape charged a small fee for its browser, although students and educators were exempt from this cost. However, when Microsoft introduced its Internet Explorer browser as free software in 1995, Netscape was forced to follow suit. The availability of free, easy-to-use browsers certainly contributed to the astounding growth of the Web in the mid 1990s.
FIGURE 4 documents the growth of the World Wide Web during the last three decades, as estimated by the Netcraft Web Server Survey. It is interesting to note some of the leaps in growth and the events that contributed to those leaps. For example, the jump from 50 Web sites in 1992 to 3,000 Web sites in 1994 can largely be attributed to the widespread adoption of Mosaic, while the next jump to 300,000 sites in 1996 can be attributed to the free availability of Netscape Navigator and Internet Explorer. Another advance that increased demand for the Web was the release of JavaScript by Brendan Eich (1961-) and his team at Netscape (see Chapter X4). First introduced in late 1995 and standardized in 1997, JavaScript enabled Web pages to behave dynamically, changing their appearance over time and reacting to user actions. With JavaScript, Web pages could be used to search for items, stream music and video, enter data into forms, and initiate actions at the click of a button. This change in the appearance and functionality of the Web, from static to dynamic pages, is sometimes referred to as Web 2.0.
Year | Web Sites |
---|---|
2020 | 1,234,228,567 |
2016 | 1,083,252,900 |
2012 | 676,919,707 |
2008 | 175,480,931 |
2004 | 52,131,889 |
2000 | 18,169,498 |
1998 | 4,279,000 |
1996 | 300,000 |
1994 | 3,000 |
1992 | 50 |
FIGURE 4. Web Growth (Netcraft Web Server Survey).
The continued expansion of the Web in the late 1990s and 2000s may be seen as a result of the network effect, a term borrowed from economics that describes when a product or service increases in value as more people use it. When there were only a few Web sites on the World Wide Web, its use was limited to a few, special-interest users. As more business and organizations added Web pages, people began to see the value of the Web and began to use it as an information source. As more people used the Web to seek out information and products, businesses found it worthwhile to invest in a Web site. This cycle continued and the number of Web users and sites grew at a rapid rate through the 2000s.
It is important to recognize that the numbers reported in FIGURE 4 refer to Web sites. Each Web site may contain tens or hundreds or even thousands of pages and files (e.g., images, sound clips, video files). According to Google Inside Search, there were at least 130 trillion individual pages on the Web in 2019, although various sources claim the number of Web pages could be somewhere in the hundreds of quadrillions.
The late 1990s were a dynamic time for the Web, during which Netscape and Microsoft released enhanced versions of their browsers and battled for market share. Initially, Netscape was successful in capitalizing on its first-mover advantage — in 1996, approximately 75% of all operational browsers were Netscape products. By 1999, however, Internet Explorer had surpassed Navigator as the most used browser. Finding it difficult to compete with Microsoft, Andreesen relinquished control of Netscape in 1999, selling the company to AOL for $10 billion in stock. Internet Explorer dominated the browser market for the next decade, until it was surpassed in popularity by the Google Chrome browser in 2012. By the end of 2024, Chrome held more than 67% of the browser market, with Apple Safari a distant second at 18%. Other browsers, including Microsoft Edge (5%), Mozilla Firefox (3%), Samsung Internet (2%), and Opera (2%), continue to have small but dedicated user communities [StatCounter.com]. In addition, the software industry offers numerous others to satisfy niche markets. These include text-based browsers for environments that don't support graphics and text-to-speech browsers for vision-impaired users.
The Web's development is now guided by a not-for-profit organization called the World Wide Web (W3) Consortium, which was founded by Tim Berners-Lee in 1994. The W3 Consortium maintains and regulates Web-related standards and oversees the design of Web-based technologies, relying mainly on volunteer labor from technically qualified and interested individuals.
Interesting Web Facts (Forbes Advisor, 2024)
✔ QUICK-CHECK 2.4: True or False? Microsoft marketed the first commercial Web browser.
✔ QUICK-CHECK 2.5: True or False? Google Chrome is the most popular Web browser in the world today.
The initial growth of the Web was largely organic. Companies and organizations purchased computers, installed Web server software, and posted Web sites on their own servers. In the mid 1990s, Internet Service Providers (ISPs) began to provide Web server space to customers, so that individuals could create and post their own Web sites. However, the Web was a bit like the Wild West in that there was little structure or organization to the growth. If you knew the Web address of a page you were interested in, you could enter that address and view the page. However, finding pages without knowing the address was challenging. Sites were eventually developed that catalogued popular sites and so served as a limited index of the Web (or, at least, a small portion of the Web). However, these indexes were dependent on individuals to identify important sites and organize them into an alphabetical index. This approach would not scale as the Web grew.
As the Web increased in size, search engines such as Yahoo Search, Infoseek and AltaVista were developed, which allowed to enter a word or phrase, and the search engine would locate pages that contained that word/phrase. These search engines utilized spiders, or Web crawlers — programs that traversed the Web, cataloging pages to match against the search words. Unfortunately, these searches engines did not always produce high quality results. For example, suppose a user entered the phrase "Cubs game" in the search engine, wishing to know the score of today's baseball game. If the search engine reported every page that contained those words, it would likely report many pages about baby bears and games of all sorts. Finding the score among all those results might prove difficult.
The Google search engine began in 1996 as a research project by graduate students Sergey Brin and Larry Page at Stanford University (FIGURE 5). Their goal was to develop a search engine that was easy to use and returned high quality results. Brin and Page developed Google's PageRank algorithm, which is used (along with various other techniques) to produce high quality search results. The PageRank algorithm ranks pages based on their perceived value and trustworthiness. If a page is linked to by many other pages, that suggests that people find its contents valuable and trustworthy. In a circular fashion, the trustworthiness of the linking pages can be considered, since being valued and trusted by a valuable/trustworthy page means more than being linked to by an unknown page. Brin and Page also revolutionized how search engines made money, selling ads to advertisers based on the search words and charging based on clickthrough. For example, a shoe store might pay to advertise only when a user enters "shoe" or "footwear" as search words. Likewise, they would be charged based on how often users clicked on the ad to go to the store's Web site. It is interesting to note that Stanford University allowed Brin and Page to freely take their thesis work and form Google in 1998. In gratitude, they donated the patent for the PageRank algorithm to Stanford then licensed that patent for $336 million in stock.
FIGURE 5. Larry Page and Sergey Brin (Ehud Kenan/Wikimedia Commons, 2003).
Today, Google is by far the most popular search engine worldwide (FIGURE 6).
Search Engine | Wordlwide Market Share |
---|---|
90.01% | |
bing | 3.95% |
Yandex | 2.34% |
Yahoo! | 1.35% |
Baidu | 0.81% |
DuckDuckGo | 0.65% |
FIGURE 6. Search engine market share, as of September 2024 (StatCounter.com).
Interesting Google Facts (Forbes Advisor, 2024)
✔ QUICK-CHECK 2.6: True or False? The PageRank algorithm is used by the Google search engine to rank search matches by relevance and trustworthiness.
✔ QUICK-CHECK 2.7: What role do spiders (or Web crawlers) play in making search engines work?
A Web page is nothing more than a text document that contains formatting information in a language called HTML (HyperText Markup Language). As you will see in Chapter X2, all you need to create a Web page is a simple text editor and familiarity with the HTML language. To view a Web page in which HTML formatting is properly applied, however, you need a computer program known as a Web browser. The job of a Web browser is to access a Web page, interpret the HTML formatting information, and display the formatted page accordingly (see FIGURE 7).
FIGURE 7. A Web browser interprets HTML instructions and formats text in a page.
The four most popular browsers on the market today are Google Chrome, Apple Safari (Macs only), Mozilla Firefox, and Microsoft Edge (Windows only). Most modern computers are sold with one or more of these browsers already installed (FIGURE 8).
Web Browsers | Market Share | Web Hosting Providers | Market Share | |
---|---|---|---|---|
Google Chrome | 65.2% | Amazon Web Services (AWS) | 33.6% | |
Apple Safari | 18.6% | Google Cloud | 9.4% | |
Microsoft Edge | 5.2% | IONOS | 5.0% | |
Mozilla Firefox | 2.7% | GoDaddy | 4.2% | |
Samsung Internet | 2.6% | EIG | 1.3% |
FIGURE 8. Search engine & Web hosting global market shares (Statista Market Insights, September 2024).
What makes the Web the "World Wide" Web is that, rather than being limited to a single computer, pages can be distributed on computers across the Internet. A Web server is an Internet-enabled computer that executes software for providing access to Web documents. When you request a Web page, either by typing its name in your browser's Address box or by clicking a link, the browser sends a request over the Internet to the appropriate server. The server locates the specified page and sends it back to your computer (FIGURE 9).
FIGURE 9. The roles of the Web browser and Web server.
Web pages require succinct and specific names so that users can identify them and browsers can locate them. For this purpose, each page is assigned a Uniform Resource Locator, or URL. For example, the home page for this book has the following URL, also known more informally as its Web address (FIGURE 9).
FIGURE 10. The components of a URL (or Web address).
A Web address begins with the protocol prefix http://
, which specifies that the HyperText Transfer Protocol should be used in communications between the browser and server. The prefix https://
may also be used, which specifies a secure HTTP connection (using encryption to protect the data being sent). The rest of the address specifies the location of the desired page. Immediately following http://
is the server name, identifying the Web server on which the page is stored followed by the file name. If the Web pages are organized into directories on the server, then the file name may be further broken down into components, based on the directory structure. For example, the Web address http://freecsc.com/X2/simple.html
refers to the Web page named simple.html
that is stored in the X2
directory on the Web server freecsc.com
.
✔ QUICK-CHECK 2.8: In the Web address http://dave-reed.com/index.html
, which part identifies the Web server where the page is stored?
The World Wide Web relies on protocols to ensure that Web pages are accessible to any computer, regardless of the machine's hardware, operating system, browser, or location. In diplomatic circles, a protocol defines the rules that govern affairs of state or diplomatic relations. Web protocols similarly define the rules for how Web pages are to be interpreted and how communication is to take place between a Web browser and server. The protocols that make the Web work are the HyperText Markup Language (HTML) and the HyperText Transfer Protocol (HTTP).
Web developers define the content of Web pages using HTML, the HyperText Markup Language. HTML defines a collection of character sequences, known as tags, that have special meaning to the browser. For example, the tags <b>
and </b>
specify to the browser that the enclosed text is to be displayed in a bold font. Likewise, the tags <h1>
and </h1>
specify a heading that appears in a large, bold font, while <hr>
specifies a horizontal rule (or line) that divides sections in the page. In addition, other elements can be embedded within a page, such as images (using the <img>
tag) and hyperlinks (using the <a>
tag). Part of a Web browser's job is to read HTML tags, interpret their meaning using the rules of the HTML protocol, and display the page content accordingly. You will learn more about HTML in the Explorations chapters (X1-X10).
HTML is an evolving standard, which means that new features are added to the language in response to changes in technology and user needs. The current standard for HTML, as defined by the World Wide Web Consortium, is HTML5. There may be subtle differences among browsers, but all Web browsers understand the same basic set of tags and display text and formatting similarly. Thus, when an author places an HTML document on a Web server, all users who view the page should see the same formatting, regardless of their computers or browser software.
To a person "surfing" the Web, the process of locating, accessing, and displaying Web pages is transparent. When the person requests a particular page, either by entering its location into the browser's Address box or by clicking a link, the new page is displayed in the browser window as if by magic. In reality, complex communications are taking place between the computer running the browser and the Web server that stores the desired page. When the person requests the page, the browser first extracts the Web server name and page name from the Web address (as in FIGURE 9). Once the server name has been extracted, the browser sends a message to that server over the Internet and requests the page. The Web server receives the request, locates the page within its directories, and sends the page's contents back in a message. When the message is received, the browser interprets the HTML formatting information embedded in the page and displays the page accordingly in the browser window.
The protocol that determines how messages exchanged between browsers and servers are formatted is known as the HyperText Transfer Protocol (HTTP). FIGURE 11, which is a more detailed version of the browser/server interaction from FIGURE 8, shows the actual HTTP messages that are exchanged as the result of a clicked link. Note that the HTTP message that requests a page begins with the command GET and includes information that can be used by the server to locate the page and retrieve it in the appropriate format (e.g., the page's address, the type of document, and the browser version). The HTTP message that is subsequently sent back to the browser includes the HTML content of that page, along with information (e.g., size of the document, data and time it was last modified) that might prove useful when displaying the page.
FIGURE 11. Web browser and server communications (with sample HTTP messages).
✔ QUICK-CHECK 2.9: True or False? In the Web address http://freecsc.com/index.html
, the prefix http://
specifies that the page is stored on a Web server and should be retrieved using the HTTP protocol.
It is interesting to note that accessing a single page might involve several rounds of communication between the browser and server. If the Web page contains embedded elements, such as images or sound clips, the browser will not recognize this until it begins displaying the page. When the browser encounters an HTML tag that specifies an embedded element, the browser must then send a separate HTTP message to the server requesting the item. Thus, loading a page that contains 10 images will require 11 interactions between the browser and server—one for the page itself and one for each of the 10 embedded images.
To avoid redundant and excessive downloading, browsers use a technique called caching. When a page or image is first downloaded, it is stored in a temporary directory on the user's computer. The next time the page or image is requested, the browser first checks to see if it has a copy stored locally in the cache, and, if so, whether the copy is up to date. This is accomplished by adding a timestamp to the HTTP message. The timestamp is just the data and time that the cached page was stored. When the server receives the request, it compares the timestamp of the cached page with the timestamp of the page stored on the server. If the cached page is more recent, then the server simply responds with an HTTP message instructing the browser to display its cached copy. If the server version is newer, however, then it is sent back to the browser to ensure that the up-to-date version of the page is displayed.
Note that caching does not reduce the number of messages that must be sent between the browser and server. Each page request still requires an HTTP message to the server and a subsequent message back to the browser. However, caching can save time when the cached copy is up-to-date, since only a confirmation message is returned as opposed to downloading the page all over again. When the requested item is a large image or video clip, this savings can be significant.
Caching relies on the browser being granted access to a directory for storing the cached pages. This limited access to the user's file system is safe and is built into all modern browsers. Beyond that, however, browsers are not allowed access to any other local files. One of the design goals of the Web was to make the physical location of the documents irrelevant — the user need not even pay attention to where documents are stored since the logical relationships between documents is what defines Web connectivity. As a result, surfing the Web often involves jumping from one page to another, accessing documents on Web sites whose owners may not even be known to you. If accessing a Web page from an unknown site had the potential to make your personal files visible to others and vulnerable to destruction, you would think twice before ever using the Web. As a result, Web browsers are built so that the files on the local computer (other than the cache directory) are not accessible.
There is a small loophole in Web browser file access that is both useful and potentially annoying to users. Cookies are small, special-purpose files that can be stored by the browser on your computer when you visit a Web site. For example, when you visit a commercial site, that Web server at that site can request that the browser store a small file on your computer with information about you (the date and time you visited the site, the items you searched for, etc.). The next time you visit that site, the browser will include this cookie file with each page request. This is how Web sites know who you are when you return to a site or remember what was in your shopping cart when you last left the site. Cookies are safe, since the browser only shares the cookie file with the Web server that originally stored the cookie. Cookies can add to the convenience of Web surfing by customizing your Web experience and saving you from repeatedly entering data. Cookies can also be intrusive, as you may not want the site to remember sensitive information or browsing history. Most browsers allow you to limit cookies, although many Web sites do not function as smoothly if cookies are disabled.
Beginners often confuse caching and cookies. It is true that both store information on the user's computer in hidden directories. However, their purposes are quite distinct. Caching stores copies of entire pages and documents, which are used by the browser to avoid redundant downloading. Without caching, Web sites would look the same but might be slower to load. Cookies, on the other hand, are small data files that are stored by the browser at the direction of the server. They provide additional functionality to pages, since a Web site can store information about your past visits and customize your next visit based on that stored information.
✔ QUICK-CHECK 2.10: True or False? In most Web browsers, cookies are used to save local copies of downloaded pages and files in order to save time when they are accessed again.
← X1: Development Basics | ↑ Table of Contents ↑ | X2: Creating a Web Page → |