X10: Software Models

For most problems found in mathematics textbooks, mathematical reasoning is quite useful. But how often do people find textbook problems in real life? At work or in daily life, factors other than strict reasoning are often more important. Sometimes intuition and instinct provide better guides; sometimes computer simulations are more convenient or more reliable; sometimes rules of thumb or back-of-the-envelope estimates are all that is needed. — Lynn Steen

A theory has only the alternative of being right or wrong. A model has a third possibility: it may be right, but irrelevant. — Manfred Eigen

When studying complex systems such as the weather, climate change, financial markets, and even sports team performance, software models (i.e., programs that simulate the behavior of the system) provide a fast and cost-effective tool for gaining insights. For example, an atmospheric scientist could create a model of storm systems and run repeated simulations to better understand how tornadoes form and move. Similarly, a stock trader could develop models of markets and use simulations and machine learning to identify trends. Software models are often cheaper and much faster than the real-world systems they are meant to simulate, and they also enable their users to experiment with different features (e.g., changing the temperature to see how that affects tornado formation, or changing trading volume to see how that affect certain stocks).

This chapter studies three software models of real-world systems in the form of interactive Web pages. While these examples are simpler than most scientific or commercial modeling applications, they provide a glimpse into the power of software for solving complex and meaningful problems.

Modeling the Spread of a Disease

When there is an outbreak of an infectious disease, such as the Ebola spread of 2014 or the COVID-19 flu pandemic of 2020, organizations such as the Center for Disease Control (CDC) and the World Health Organization (WHO) develop software models to predict the potential impact of the disease. By considering factors such as the disease transmission method, patient incubation period, and population density, a software model of the disease's spread can be constructed. This model can then be used to study the spread of the disease under various conditions, which can both inform the public of the threat and also demonstrate the effectiveness of intervention techniques (such as social distancing).

One measure that is commonly used in disease spread models is the basic reproduction number. The basic reproduction number, often denoted as R₀, is the average number of people that a single infectious patient will infect over the course of their infection. If R₀ < 1, then each person infects fewer than one person on average, and so the disease will eventually die out. If R₀ = 1, then each person infects one other person on average, and so the disease will steadily spread. This is referred to as an endemic. If R₀ > 1, then each person infects more than one person on average and exponential growth will ensue. This is referred to as an epidemic, or if it spreads on a global scale, a pandemic.

A simple model of disease spread starts with an initial population of infected patients and estimates successive waves of infections using R₀, where a wave is the period in which each infected person spreads the disease. For example, suppose you have 100 initial patients and an R₀ = 2. In the first wave, each of the 100 patients will infect 2 people (on average) resulting in 200 new infections. In the second wave, each of those 200 patients will infect 2 people (on average) resulting in 400 new infections. Subsequent waves will infect 800, 1,600, 3,200, 6,400, and so on. By the 10th wave, a total of 204,700 people will have been infected. This type of exponential growth is terrifying and, in the case of diseases that require medical care, can quickly overwhelm the health care system.

R₀ is clearly not the only factor in determining how impactful a disease outbreak will be. The seasonal flu, for example, is estimated to have an R₀ between 0.9 and 2.1. Fortunately, the flu is short-lived and not life-threatening for most patients. Typically, flu symptoms are easily observable, allowing for the isolation of infected individuals. Flu vaccines have also provided immunity to segments of the population, which helps to limit spread. Vaccines have been developed for many of the most potent infectious diseases, such as Polio (R₀ ≈ 12.5), Measles (R₀ ≈ 14.5) and Chickenpox (R₀ ≈ 15). [University of Michigan School of Public Health]

It is also worth noting that the R₀ for a disease is not static. Within a given population, changes in behavior and in medical treatments can greatly affect R₀. For example, when COVID-19 first emerged in the United States in early 2020, its severity and transmission methods were not fully understood. In March 2020, the R₀ for COVID-19 across the United States was estimated to be around 4, with certain hotspots (e.g., New York City) exceeding 6. As more knowledge was gained, most people altered their behavior, adopting social distancing and mask wearing. By the second half of 2020, the R₀ for COVID-19 across the U.S. dropped to around 2. The development and distribution of vaccines in early 2021 helped bring the national R₀ below 1. [covid19-projections.com]

Web Page Implementation

The HTML document in EXAMPLE 10.1 implements an R₀-based model of disease spread. The page contains number boxes where the user enters the initial size of the infected population, the value of R₀, and the number of waves to simulate. When the button is clicked to run the model simulation, the numbers from the boxes are accessed and stored in variables. The variable totalCases, which keeps track of the total number of cases, is also initialized to the initial number of cases. Each time through the while loop, the number of new infections is calculated using currentCases and the basic reproduction number R₀. Note that the Math.floor function is applied to ensure that the number of cases is an integer (since you can't have a fraction of a person infected!). The new cases are added to the total number of cases and both numbers are added to the paragraph. The counter variable currentWave controls the while loop, ensuring that the loop repeats the specified number of waves.

<!doctype html>
<html>
 <head>
   <meta name="author" content="Dave Reed">
   <title>Disease Spread Page</title>
   <script>
     function Spread() {
       currentCases = casesBox.valueAsNumber;
       R0 = r0Box.valueAsNumber;
       numWaves = waveBox.valueAsNumber;
       
       totalCases = currentCases;       
       casesP.innerHTML = 'Initially, ' + currentCases + ' cases.';
       
       currentWave = 1;
       while (currentWave <= numWaves) {
         currentCases = Math.floor(currentCases * R0);
         totalCases = totalCases + currentCases;
    
         casesP.innerHTML += '<br>Wave ' + currentWave + ': ' + 
                              currentCases + ' new cases → ' + 
                              totalCases + ' total cases.';
         
         currentWave = currentWave + 1;
       }
     }
   </script>
 </head>

<body style="text-align:center">
   <h1>Disease Spread Model</h1>
     
   <table  style="margin:auto; text-align:left">
     <tr><td><label for="casesBox">Number of initial cases:</label></td>
         <td><input type="number" id="casesBox" 
                    min=0 max=9999999 value=100></td></tr>
     <tr><td><label for="r0Box">Reproduction number R<sub>0</sub>:</label></td>
         <td><input type="number" id="r0Box" 
                    min=0 max=999.9 value=1></td></tr>
     <tr><td><label for="waveBox">Number of waves to model:</label></td>
         <td><input type="number" id="waveBox" 
                    min=0 max=99999 value=10></td></tr>
   </table>
   <p>
     <button onclick="Spread();">Run Simulation</button>
   </p>
   
   <hr>
   <p id="casesP"></p>
 </body>
</html>

EXERCISE 10.1: Confirm that the Web page from EXAMPLE 10.1 behaves as described by clicking the 'See in Browser' button and entering numbers in the boxes. Then, conduct a series of simulations with the following values, recording the total cases for each:

initial cases = 100, R₀ = 0.6, waves = 10
initial cases = 100, R₀ = 1, waves = 10
initial cases = 100, R₀ = 2, waves = 10
initial cases = 100, R₀ = 2.5, waves = 10

Your simulations from EXERCISE 10.1 demonstrate that even small changes to the infection rate can greatly impact the spread of a disease.

R₀ = 0.6: The number of new infections decreases in each subsequent wave, from 60 new cases in wave 1, to 36 new cases in wave 2, and so on. No new infections occur after wave 8, so the disease dies out, and the potential health crisis is averted. Total cases after 10 waves = 243.
R₀ = 1: The disease spreads at a steady rate (100 new infections each wave). From a public health perspective, any disease that continues to spread over time can lead to a crisis. However, the steady spread of this disease is at least manageable. If the number of doctors and hospital beds is sufficient to service the patients within a particular wave, it will continue to be sufficient as those patients recover and are replaced by the next wave. Given time, new treatments or vaccines may be developed that can reduce R₀ and lead to the end of the disease. Total cases after 10 waves = 1,100.
R₀ = 2: The number of new infections doubles every wave, from 200 new cases in wave 1, to 400 new cases in wave 2, and so on. By the 10th wave, there are more than 100 thousand new cases. This is clearly an epidemic that would likely overwhelm the health care system if unchecked. Total cases after 10 waves = 204,700.
R₀ = 2.5: The growth acceleration caused by the increase from 2 to 2.5 is dramatic. The number of new infections in wave 1 is 250 (vs. 200), wave 2 is 650 (vs. 400), wave 3 is 1,562 (vs. 800), and so on. By the 10th wave, there are more than 950 thousand new cases (vs. 100 thousand). Total cases after 10 waves = 1,588,788.

For a public health perspective, it is clear that reducing R₀ as much and as early as possible is key to avoiding major health crises.

EXERCISE 10.2: In the early stages of COVID-19 in the United States, the average R₀ across the country was estimated to be around 4. Assuming a constant R₀ = 4 and starting from a population of 100 infected people, how many waves would it require to obtain a total of 10,000 infections (i.e., the population of a small town)? 100,000 infections (i.e., the population of a medium-sized city)? 1,000,000 infections (i.e., the population of a large city)? 340,000,000 infections (i.e., the population of the U.S.)?

EXERCISE 10.3: One benefit of disease spread models is that they educate the population about potential dangers and can lead to behavior changes that slow the spread of the disease. Perform an experiment using the software model from EXAMPLE 10.1 to observe the impact dropping the R₀ during an outbreak.

First, consider the case where no change in R₀ occurs. Starting with 100 patients and an R₀ of 2, simulate 20 waves. How many new cases are created in the 20th wave?
Next, assume that behavioral changes and medical interventions reduce R₀ midway through. Starting with 100 patients and an R₀ of 2, simulate 10 waves. Then, take the number of new infections in the 10th wave and start a new simulation with that initial population and an R₀ of 1. How many new cases are created in the 10th wave?

Models vs. the Real World

Software models are powerful and versatile tools that are used extensively to study complex real-world systems. The disease spread model from the previous section, while simpler than most models used by researchers, clearly demonstrates some of the advantages and disadvantages of software models. Advantages include:

CONVENIENCE:: Software models can be executed anywhere sufficient computer power is available. A scientist studying the impact of commercial fishing off the coast of Alaska need not travel there when a software model is available on their laptop. In the case of disease spread, the model allows researchers to study potential spread without going out into the field to observe actual cases.
SAFETY:: Software models, since they are run on computers, pose no risk to patients or researchers. Models are increasingly being used in the pharmaceutical industry as an alternative to animal or human testing. If a model can be used to test for potential side effects of a proposed drug, this avoids the ethical and moral considerations of endangering living subjects. In the case of the disease spread model, no patients are put at risk, and the researchers are likewise removed from potential exposure.
SPEED:: The speed at which software models execute is only limited by the complexity of the model and the processing power of the underlying computer. A climate change software model can look years or even centuries into the future in mere seconds, allowing scientists to predict the outcome of environmental changes. In the case of the disease spread model, the simulation can run much faster than real-time studies, allowing researchers to predict spread into the future and allocate medical resources accordingly.
CUSTOMIZABILITY:: Typically, software models are built with parameters that can be adjusted. This allows for customizing the model so that it can test different conditions. A meteorologist can adjust the speed and path of a hurricane in a weather model to better prepare for possible impacts. With the disease spread model, researchers can adjust the value of R0 to simulate different scenarios and predict the impact of various interventions.
REPRODUCIBILTY:: A software model can be executed repeatedly, to test for variability. If repeated executions produce similar outcomes, this increases confidence in that outcome. Alternatively, if outcomes vary considerably, this suggests a system that is sensitive to randomness or small changes in conditions.

Software models also have disadvantages when compared with real-world experimentation.

OVERSIMPLIFICATION:: By their very definition, computer models simplify the systems they are attempting to simulate, ignoring factors that might in fact be very important. For example, the disease spread model described here assumes that the infection rate is constant, ignoring factors such as population density, methods of transmission, and the effect of interventions such as social distancing. In the real world, these factors could greatly impact disease spread.
INCORRECTNESS:: As with any software system developed by human programmers, software models can contain errors that impact the results. Even if the model is an accurate representation of the real-world system, a bug in the software could lead to incorrect or misleading results.
OVERCONFIDENCE:: Due to flaws in the model design or implementation, the results produced by a software model may look reasonable but in fact be incorrect or misleading. People tend to believe the results obtained from complex programs and so can be misled if they are not careful.

While computer models can be extremely useful in studying real-world systems, they must be carefully examined to ensure that the simplifying assumptions made by the model do not invalidate the conclusions drawn from simulations using that model. They must also be tested thoroughly to ensure that software bugs do not invalidate the results.

Nondeterministic Models

The disease spread model from the previous section is classified as a deterministic model, since its output is entirely determined by its inputs. In other words, it will always produce the same results given the same inputs. Many real-world systems are too complex to be modeled deterministically. In such cases, factors that are difficult to quantify can be simulated probabilistically. For example, a stock market model might not be able to definitively say that a stock will rise in value if certain conditions hold, but it can say that there is a high probability based on past market behavior. Models that utilize probability are classified as nondeterministic since they may produce different results each time they are run.

The Web page in FIGURE 1 implements a nondeterministic model of disease spread. Unlike the deterministic model using R₀, this model takes into account patient proximity, the likelihood of exposure leading to infection, and the length of a patient's contagious period. The population is modeled by a two-dimensional grid, with each cell in the grid representing a person. By default, the grid is 50x50, but this can be adjusted by the user with the slide bar that appears below the grid. A grid square can be displayed in 3 colors: white represents a healthy person, red represents an infected and contagious person, and gray represents a person who was infected but is no longer contagious. At the start of each simulation, a single patient at the center of the grid is assumed to be infected.

In addition to the grid size, the user can adjust two factors that control the behavior of the model. The infection probability defines how likely it is that a neighbor of an infected person will become infected. The default is 20%, meaning that each of the eight grid cells that are adjacent to an infectious (red) cell will have a 20% chance of becoming infected (red). The contagious period is the number of days that an infected person remains contagious after onset. By default, the contagious period is set at four days, meaning an infected person can infect those with whom they come in contact for four days, but after that they are no longer a risk to spread the disease. Both the infection probability and the contagious period can be adjusted using slide bars.

Grid size: 50

Disease Spread Simulator

This simulation models the spread of a disease, starting with one infected patient on Day 0. Note: red cells represent infected and contagious patients, while gray cells represent recovered patients.

Infection probability: 20%
Contagious period: 4 days

Day: 0 Percentage infected: 0%

with second delay

FIGURE 1. Nondeterministic Disease Spread Model.

EXERCISE 10.4: Experiment with the nondeterministic model in FIGURE 1. If you run several simulations using the same settings, are the resulting infection patterns different from each other? If so, how similar are they in terms of shape and overall percentage infected?

EXERCISE 10.5: Determine how sensitive this nondeterministic model is to changes in the infection rate and contagious period. For example, if you reduce the infection rate to 15%, does that significantly change the shape and the overall infection percentage of the resulting simulations? How about if you reduce the infection rate to 10%? How low would you need to reduce the infection rate for the spread to completely die out within the population?

Similarly, how does decreasing or increasing the contagious period affect the simulation results? How low would you need to reduce the contagious period for the spread to completely die out within the population?

Modeling Volleyball Matches

Prior to 1998, collegiate and professional volleyball matches played throughout the world used the sideout scoring system, in which only the serving team could earn a point. If the receiving team won a rally, then a sideout occurred and that team was awarded the serve but not a point. The result was that individual sets in a match (played to 15 points) could be arbitrarily long, with stretches of alternating sideouts in which neither team scored any points. In 1998, the international governing body for volleyball, FIVB, changed the rules to utilize a new scoring system: rally scoring. Under rally scoring, the winner of every rally earns a point regardless of which team was serving. Many credit this rule change as making the matches more exciting and broadcast-friendly, leading to volleyball's increased popularity worldwide.

When making such a drastic revision to the rules of the sport, FIVB needed to be sure that the change would produce the results they wanted (more exciting matches with more consistent lengths) without negatively affecting competitive balance. While it might be possible to take several volleyball teams, have them play a series of sets under the new rules, and analyze the results, this approach has numerous drawbacks. To be statistically significant, an extremely large number of sets would have to be played to balance out lucky or unlucky streaks. Plus, factors such as team health, scheduling, and home vs. away records would have to be carefully considered in order to compare set results. Fortunately, computer simulations can address these challenges. Using computer programs, millions of sets can be simulated in seconds, and variables such as relative team strengths can be quantified.

It is interesting to note that, along with changing from sideout scoring to rally scoring, FIVB also increased the number of points needed to win a set from 15 points to 25 points. It makes sense that number of points would need to increase, since every rally earns a point regardless of the server. But why not play to 20 points or 30 points? By analyzing large numbers of simulations, FIVB determined that a set length of 25 maintained roughly the same competitive balance as sideout scoring to 15. For example, if a team had a 60% likelihood of winning a set against a particular opponent under 15-point sideout scoring, the simulations showed they would have very close to a 60% likelihood of winning using 25-point rally scoring. So, to maintain competitive balance, the 25-point set was adopted.

To model a volleyball set between two teams, we could consider many factors such as statistics on serve rate and kill percentage, home vs. road records, and experience levels of the players. Similar to how R₀ reduced all the factors that contribute to the spread of a disease spread into a single number, we can likewise reduce all the competitive factors of a volleyball team into a single number. That number, the team's strength, will range from 1 to 100, with a higher number denoting a better team. The strength number measures how likely a team is to win any given point. So, if team1 has a strength of 80 and team2 has a strength of 40, then team1 will be twice as likely to win any given point. Likewise, if team1 has a strength of 60 and team2 has a strength of 50, then team1 will be 20% more likely to win a point.

Given strength numbers for the two teams, we can define a nondeterministic model that simulates a set point-by-point. The winner of each point is determined probabilistically, so that a team whose strength number is 20% higher should win the point 20% more often. To determine a point winner, the model will use the following algorithm:

Let strength1 and strength2 denote the strength numbers of the two teams.
Generate a random integer from the range 1..(strength1+strength2).
If the random number is ≤ strength1,
we will say that team1 won the point;
Otherwise,
we will say that team2 won the point.

To see why this works, consider the example where team1's strength is 80 and team2's strength is 40. Team1's strength is double team2's so the likelihood of team1 winning the point should be double that of team2. Since there are twice as many numbers in the range 1..80 as there are in 81..120, the chances of selecting a random number from 1..80 (and declaring team1 the winner) is twice as likely. Similarly, if the strengths are 60 and 50, then the likelihood of team1 winning a point is 20% higher than for team2 (since the range 1..60 is 20% larger than 61..110).

Web Page Implementation

The HTML document in EXAMPLE 10.2 implements this nondeterministic volleyball set model. The team strengths and the set length (number of points needed to win) are entered by the user into number boxes so that simulations can easily be run under different conditions. When the button is clicked, the PlaySet function accesses and stores these numbers in variables. Two additional variables, score1 and score2, are initialized to keep track of the scores of the two teams. A while loop repeatedly simulates playing a rally (by calling PlayRally) and updating the score. The set continues as long as neither team has reached the number of points needed for a win.

The PlayRally function determines the winner of a rally using the algorithm described above. It generates a random integer from the range 1..(strength1+strength2) using the RandomInt function from the random.js library. If that number is less than or equal to strength1, then team1 is declared the winner and the function returns "team1". Otherwise, team2 is the winner and the function returns "team2". These return values are used by the PlaySet function to update the appropriate team's score.

<!doctype html>
<html>
 <head>
   <meta name="author" content="Dave Reed">
   <title> Volleyball Set Simulator</title>
   <script src="https://freecsc.com/random.js"></script>
   <script>
     function PlayRally(team1, team2) {
       if (RandomInt(1, team1+team2) <= team1) {
         return "team1";
       }
       else {
         return "team2";
       }         
     }
       
     function PlaySet() {
       team1 = team1Box.valueAsNumber;
       team2 = team2Box.valueAsNumber;
       needed = pointsBox.valueAsNumber;
       
       score1 = 0;
       score2 = 0;
       
       setsP.innerHTML = 'Playing a set to ' + needed + '...<br>';
         
       while (score1 < needed && score2 < needed) {
         winner = PlayRally(team1, team2);
         if (winner == "team1") {
           score1 = score1 + 1;
         }
         else {
           score2 = score2 + 1;
         }
           
         setsP.innerHTML += score1 + '-' + score2 + '<br>';
       }
     }
   </script>
 </head>

<body style="text-align:center"> 
   <h1 >Volleyball Set Simulator</h1>
     
   <table style="margin:auto; text-align:left">
     <tr><td><label for="team1Box">Team 1 Strength (1-100):</label> </td>
         <td><input type="number" id="team1Box" 
                  min=0 max=100 value=50></td></tr>
     <tr><td><label for="team2Box">Team 2 Strength (1-100):</label></td> 
         <td><input type="number" id="team2Box" 
                    min=0 max=100 value=50></td></tr>   
     <tr><td><label for="pointsBox">Points needed to win:</label></td> 
         <td><input type="number" id="pointsBox" 
                    min=1 max=999 value=25></td></tr> 
   </table>
   <button onclick="PlaySet();">Simulate Set</button>
   
   <hr>
   <p id="setsP"> &nbsp; </p>
 </body>
</html>

EXERCISE 10.6: Confirm that the Web page from EXAMPLE 10.2 behaves as described by clicking the 'See in Browser' button and then clicking on the 'Simulate Set' button in the resulting page. Is it possible for a team to win by a single point? Run multiple simulations to see if this ever occurs.

In volleyball, as in many sports, a team must win by at least two points. If a team reaches 25 points but their opponent is at 24, the set continues. As a result, the condition for winning a set is more complex than just "first to reach 25." The winning team must reach 25 points AND be ahead by at least two points.

Since the test in a while loop defines the condition on which the loop continues, the while loop in our model must reverse the stopping condition. In other words, it must define the condition for which the game does NOT end. Clearly, if neither team has reached 25 points, then the game continues. However, the game also continues if the teams are within 1 point of each other. We can write this complex condition in JavaScript using the OR operator (||):

    (score1 < 25 && score2 < 25) || Math.abs(score1-score2) < 2

The HTML document in EXAMPLE 10.3 updates the while loop test to use a condition of this form.

<!doctype html>
<html>
 <head>
   <meta name="author" content="Dave Reed">
   <title> Volleyball Set Simulator</title>
   <script src="https://freecsc.com/random.js"></script>
   <script>
     function PlayRally(team1, team2) {
         if (RandomInt(1, team1+team2) <= team1) {
           return "team1";
         }
         else {
           return "team2";
         }         
     }
       
     function PlaySet() {
       team1 = team1Box.valueAsNumber;
       team2 = team2Box.valueAsNumber;
       needed = pointsBox.valueAsNumber;
       
       score1 = 0;
       score2 = 0;
       
       setsP.innerHTML = 'Playing a set to ' + needed + '...<br>';
         
       while ((score1 < needed && score2 < needed) || 
              Math.abs(score1-score2) < 2) {
         winner = PlayRally(team1, team2);
         if (winner == "team1") {
           score1 = score1 + 1;
         }
         else {
           score2 = score2 + 1;
         }
           
         setsP.innerHTML += score1 + '-' + score2 + '<br>';
       }
     }
   </script>
 </head>

<body style="text-align:center"> 
   <h1 >Volleyball Set Simulator</h1>
     
   <table style="margin:auto; text-align:left">
     <tr><td><label for="team1Box">Team 1 Strength (1-100):</label></td>
         <td><input type="number" id="team1Box" 
                  min=0 max=100 value=50></td></tr>
     <tr><td><label for="team2Box">Team 2 Strength (1-100):</label></td> 
         <td><input type="number" id="team2Box" 
                    min=0 max=100 value=50></td></tr>   
     <tr><td><label for="pointsBox">Points needed to win:</label></td> 
         <td><input type="number" id="pointsBox" 
                    min=1 max=999 value=25></td></tr> 
   </table>
   <button onclick="PlaySet();">Simulate Set</button>
   
   <hr>
   <p id="setsP"> &nbsp; </p>
 </body>
</html>

EXERCISE 10.7: Confirm that the Web page from EXAMPLE 10.3 behaves as described by clicking the 'See in Browser' button and then clicking on the 'Simulate Set' button in the resulting page.

If the two teams have equal strengths, it should follow that each team would have the same likelihood of winning a set. Set the teams strengths to the same number (it doesn't matter what number as long as they are equal) and simulate 10 sets to 25. Do the sets tend to be close? Does each team win roughly half of the sets played?

EXERCISE 10.8: Consider a situation where one team is slightly better than the other, say 10% stronger. Simulate 10 sets to 25 in which the team strengths are 66 and 60. List the final scores of the sets. Are the individual sets more one-sided? Does the better team win more sets?

EXERCISE 10.9: Next, consider a set in which one team is significantly better, say twice as strong. Simulate 10 sets to 25 between teams with strengths of 60 and 30. List the final scores of the sets. Are the sets heavily skewed in favor of the dominant team? What is the win total for each team?

Designer Secrets

In English, descriptions of repetitive tasks usually focus on the stopping condition for the loop — an activity repeats until something happens. Conversely, the loop test in a while loop specifies the condition under which the loop continues — an activity repeats as long as something continues to happen. Thus, when you define a test for conditional repetition, the condition you specify should be the exact opposite of the condition as you would state it in English. For example, we would normally describe the rules of a volleyball set in terms of the stopping condition: the set is over when one of the teams earns the needed number of points AND is ahead by at least two points. However, the test for a while loop requires the condition under which the set must continue — as long as neither team has earned the needed number of points OR (barring that) the two teams are within one point of each other.

Simulating Many Sets

With any events that involve randomness, it is essential to perform a large number of experiments to ensure statistically meaningful results. For example, if you flip a coin 10 times, it wouldn't shock you if you obtained 7 heads (70%). A few lucky head flips out of 10 can greatly skew the results. However, it should shock you if you flipped 70 heads out of 100 flips, or 700 out of 1,000. As the number of flips increases, streaks of heads and streaks of tails should balance out, converging on the expected result (50% of each).

This same pattern occurs when simulating events using a nondeterministic model. Exercise 10.7 asked you to simulate 10 volleyball sets between evenly matched teams using the page from EXAMPLE 10.3. With only 10 sets simulated, you might easily have obtained unbalanced results such as team1 winning 7 of the 10 sets. For a complex model such as this, you might need to perform hundreds or even thousands of simulations for the results to be consistent and accurate. Clearly, simulating 1,000 individual sets using this page would be tedious. A better alternative is to automate the simulation of repeated sets.

The HTML document in EXAMPLE 10.4 modifies the document from EXAMPLE 10.3 by adding a number box where the user can enter the number of sets to simulate. The PlaySet function is modified and a new function, PlayManySets, is added to perform the repeated simulations:

Within the PlaySet function, the statements that displayed the score after each point are removed. When simulating a large number of sets, displaying the point-by-point score for all the sets would be overwhelming and would slow down the simulation considerably.
An if statement has been added to the end of the PlaySet function. Since the function no longer displays the score, there needs to be some way for it to specify the winner of a set. The new if statement compares the scores of the two teams after the set is completed and returns a string identifying the winner (either 'team1' or 'team2').
The PlayManySets function has been added to simulate and maintain statistics on multiple sets. The function utilizes a while loop controlled by a counter (setsPlayed) to simulate the specified number of sets. Within the loop, the revised PlaySet function is called to simulate a set, and a counter (wins1) is incremented each time the function call returns 'team1'.
Finally, the button is modified so that it calls the PlayManySets function when clicked.

<!doctype html>
<html>
 <head>
   <meta name="author" content="Dave Reed">
   <title> Volleyball Set Simulator</title>
   <script src="https://freecsc.com/random.js"></script>
   <script>
     function PlayRally(team1, team2) {
         if (RandomInt(1, team1+team2) <= team1) {
           return "team1";
         }
         else {
           return "team2";
         }         
     }
       
     function PlaySet() {
       team1 = team1Box.valueAsNumber;
       team2 = team2Box.valueAsNumber;
       needed = pointsBox.valueAsNumber;
       
       score1 = 0;
       score2 = 0;
       
       while ((score1 < needed && score2 < needed) || 
              Math.abs(score1-score2) < 2) {
         winner = PlayRally(team1, team2);
         if (winner == "team1") {
           score1 = score1 + 1;
         }
         else {
           score2 = score2 + 1;
         }
       }
       
       return winner;
     }
     
     function PlayManySets() {
       numSets = setsBox.valueAsNumber;
       
       wins1 = 0;
       setsPlayed = 0;
       
       while (setsPlayed < numSets) {
         if (PlaySet() == 'team1') {
           wins1 = wins1 + 1;
         }
         setsPlayed = setsPlayed + 1;
       }
       
       percent = 100*wins1/numSets;
       setsP.innerHTML = 'Win % for team 1 = ' + percent.toFixed(1) + '%';
     }
   </script>
 </head>

<body style="text-align:center"> 
   <h1 >Volleyball Set Simulator</h1>
     
   <table style="margin:auto; text-align:left">
     <tr><td><label for="team1Box">Team 1 Strength (1-100):</label></td>
         <td><input type="number" id="team1Box" 
                  min=0 max=100 value=50></td></tr>
     <tr><td><label for="team2Box">Team 2 Strength (1-100):</label></td> 
         <td><input type="number" id="team2Box" 
                    min=0 max=100 value=50></td></tr>   
     <tr><td><label for="pointsBox">Points needed to win:</label></td> 
         <td><input type="number" id="pointsBox" 
                    min=1 max=999 value=25></td></tr> 
     <tr><td><label for="setsBox">Sets to simulate:</label></td> 
         <td><input type="number" id="setsBox" 
                    min=1 max=99999999 value=1000></td></tr> 
   </table>
   <button onclick="PlayManySets();">Simulate Sets</button>
   
   <hr>
   <p id="setsP"> &nbsp; </p>
 </body>
</html>

EXERCISE 10.10: Confirm that the Web page from EXAMPLE 10.4 behaves as described by clicking the 'See in Browser' button and then clicking on the 'Simulate Sets' button in the resulting page.

Set the team scores to be the same (e.g., 50 and 50) and the number of sets to be 100. Conduct multiple simulations. Is the winning percentage for team 1 close to 50% each time?
Increase the number of sets to 1,000 and conduct multiple simulations. Are the winning percentages more consistently close to 50%?
Finally, increase the number of sets to 10,000 and conduct multiple simulations. Are the winning percentages more consistently close to 50%?

EXERCISE 10.11: Set the team strengths to 66 and 60, meaning that team1 is 10% better than team2.

Conduct 10 simulations and list the results. How consistent are the results (i.e., how much variability is there for team1's winning percentage)?
Increase the number of sets to 10,000 and conduct another 10 simulations. List the results. Are the results more consistent than the results from 1,000 sets?

If team1 has a strength rating that is 10% higher than team2, then it follows that team1 has a 10% higher chance of winning any given point. You might be tempted to think that a 10% advantage on a point would translate to a 10% advantage on winning a game. However, the truth is that even a small advantage compounds over the course of the game. In essence, the underdog has to beat the odds over and over in order to pull off a win. For a small difference in team strengths, this disadvantage may not be too difficult to overcome. When there is a large gap in team strengths, however, the odds can be insurmountable.

EXERCISE 10.12: Conduct simulations to determine the likelihood of team1 winning given the following team strengths (which range from 10% to 100% differences). Assume a set length of 25 and 10,000 sets per experiment.

    team1       team2 	    winning %
    strength    strength    for team1
    ----------------------------------
    44	        40
    48	        40
    52	        40
    56	        40
    60	        40
    80	        40

Describe the overall pattern shown by your results. As the difference between the team strengths increases, how does this change the likelihood of the better team winning? What does this suggest to you about the "fairness" of volleyball's scoring rules?

EXERCISE 10.13: In college volleyball in the U.S., the initial sets in a match are played to 25. If the teams are tied at two sets each, however, the deciding fifth set is played to 15. An interesting question is whether the shorter set favors either of the teams. Repeat the simulations from EXERCISE 10.12, only now with a set length of 15. Assume the number of sets for each experiment is still 10,000.

    team1       team2 	    winning %
    strength    strength    for team1
    ----------------------------------
    44	        40
    48	        40
    52	        40
    56	        40
    60	        40
    80	        40

Compare your results with those obtained in EXERCISE 10.12. Does the shorter set tend to favor either team? Is the effect consistent regardless of how closely matched the teams are, or is it heightened by differences in strength? Identify any pattern you see in the results and provide a possible rationale for why that pattern holds.

Chapter Summary

A software model is a program that simulates the behavior of a real-world system, such as the spread of a disease or a volleyball match.
Advantages of software models include convenience (i.e., they can be executed anywhere sufficient computer power is available), safety (i.e., they pose no risk to patients or researchers), speed (i.e., they can simulate the real-world system much faster), customizability (i.e., they have parameters that can be altered to test various conditions) and reproducibility (i.e., they can be run multiple times to confirm results and to compare the outcomes under different conditions).
Disadvantages of software models include oversimplification (i.e., omitting important factors that impact the model's behavior), incorrectness (i.e., flaws in the design or implementation of the software model), and overconfidence (i.e., a tendency to believe the results without examining the model carefully).
Computer models must be carefully examined to ensure that the simplifying assumptions made by the model do not invalidate the conclusions drawn from simulations using that model. They must also be tested thoroughly to ensure that software bugs do not invalidate the results.
A deterministic model will produce the same results given the same inputs.
A nondeterministic model uses probability to simulate factors that are otherwise difficult to quantify and so can produce different results from the same inputs.
With any events that involve randomness (including the simulation of nondeterministic models), it is essential to perform a large number of experiments to ensure statistically meaningful results. As the number of experiments increases, lucky and unlucky streaks balance out, and the results should converge on the expected value.