| ← C10: Inside Computers | ↑ Table of Contents ↑ | C11: Impacts of Computing → |
For most problems found in mathematics textbooks, mathematical reasoning is quite useful. But how often do people find textbook problems in real life? At work or in daily life, factors other than strict reasoning are often more important. Sometimes intuition and instinct provide better guides; sometimes computer simulations are more convenient or more reliable; sometimes rules of thumb or back-of-the-envelope estimates are all that is needed. — Lynn Steen
A theory has only the alternative of being right or wrong. A model has a third possibility: it may be right, but irrelevant. — Manfred Eigen
When studying complex systems such as the weather, climate change, financial markets, and even sports team performance, software models (i.e., programs that simulate the behavior of the system) provide a fast and cost-effective tool for gaining insights. For example, an atmospheric scientist could create a model of storm systems and run repeated simulations to better understand how tornadoes form and move. Similarly, a stock trader could develop models of markets and use simulations and machine learning to identify trends. Software models are often cheaper and much faster than the real-world systems they are meant to simulate, and they also enable their users to experiment with different features (e.g., changing the temperature to see how that affects tornado formation, or changing trading volume to see how that affect certain stocks).
This chapter studies three software models of real-world systems in the form of interactive Web pages. While these examples are simpler than most scientific or commercial modeling applications, they provide a glimpse into the power of software for solving complex and meaningful problems.
When there is an outbreak of an infectious disease, such as the Ebola spread of 2014 or the COVID-19 flu pandemic of 2020, organizations such as the Center for Disease Control (CDC) and the World Health Organization (WHO) develop software models to predict the potential impact of the disease. By considering factors such as the disease transmission method, patient incubation period, and population density, a software model of the disease's spread can be constructed. This model can then be used to study the spread of the disease under various conditions, which can both inform the public of the threat and also demonstrate the effectiveness of intervention techniques (such as social distancing).
One measure that is commonly used in disease spread models is the basic reproduction number. The basic reproduction number, often denoted as R0, is the average number of people that a single infectious patient will infect over the course of their infection. If R0 < 1, then each person infects fewer than one person on average, and so the disease will eventually die out. If R0 = 1, then each person infects one other person on average, and so the disease will steadily spread. This is referred to as an endemic. If R0 > 1, then each person infects more than one person on average and exponential growth will ensue. This is referred to as an epidemic, or if it spreads on a global scale, a pandemic.
A simple model of disease spread starts with an initial population of infected patients and estimates successive waves of infections using R0, where a wave is the period in which each infected person spreads the disease. For example, suppose you have 100 initial patients and an R0 = 2. In the first wave, each of the 100 patients will infect 2 people (on average) resulting in 200 new infections. In the second wave, each of those 200 patients will infect 2 people (on average) resulting in 400 new infections. Subsequent waves will infect 800, 1,600, 3,200, 6,400, and so on. By the 10th wave, a total of 204,700 people will have been infected. This type of exponential growth is terrifying and, in the case of diseases that require medical care, can quickly overwhelm the health care system.
R0 is clearly not the only factor in determining how impactful a disease outbreak will be. The seasonal flu, for example, is estimated to have an R0 between 0.9 and 2.1. Fortunately, the flu is short-lived and not life-threatening for most patients. Typically, flu symptoms are easily observable, allowing for the isolation of infected individuals. Flu vaccines have also provided immunity to segments of the population, which helps to limit spread. Vaccines have been developed for many of the most potent infectious diseases, such as Polio (R0 ≈ 12.5), Measles (R0 ≈ 14.5) and Chickenpox (R0 ≈ 15). [University of Michigan School of Public Health]
It is also worth noting that the R0 for a disease is not static. Within a given population, changes in behavior and in medical treatments can greatly affect R0. For example, when COVID-19 first emerged in the United States in early 2020, its severity and transmission methods were not fully understood. In March 2020, the R0 for COVID-19 across the United States was estimated to be around 4, with certain hotspots (e.g., New York City) exceeding 6. As more knowledge was gained, most people altered their behavior, adopting social distancing and mask wearing. By the second half of 2020, the R0 for COVID-19 across the U.S. dropped to around 2. The development and distribution of vaccines in early 2021 helped bring the national R0 below 1. [covid19-projections.com]
The HTML document in EXAMPLE 10.1 implements an R0-based model of disease spread. The page contains number boxes where the user enters the initial size of the infected population, the value of R0, and the number of waves to simulate. When the button is clicked to run the model simulation, the numbers from the boxes are accessed and stored in variables. The variable totalCases, which keeps track of the total number of cases, is also initialized to the initial number of cases. Each time through the while loop, the number of new infections is calculated using currentCases and the basic reproduction number R0. Note that the Math.floor function is applied to ensure that the number of cases is an integer (since you can't have a fraction of a person infected!). The new cases are added to the total number of cases and both numbers are added to the paragraph. The counter variable currentWave controls the while loop, ensuring that the loop repeats the specified number of waves.
EXERCISE 10.1: Confirm that the Web page from EXAMPLE 10.1 behaves as described by clicking the 'See in Browser' button and entering numbers in the boxes. Then, conduct a series of simulations with the following values, recording the total cases for each:
Your simulations from EXERCISE 10.1 demonstrate that even small changes to the infection rate can greatly impact the spread of a disease.
For a public health perspective, it is clear that reducing R0 as much and as early as possible is key to avoiding major health crises.
EXERCISE 10.2: In the early stages of COVID-19 in the United States, the average R0 across the country was estimated to be around 4. Assuming a constant R0 = 4 and starting from a population of 100 infected people, how many waves would it require to obtain a total of 10,000 infections (i.e., the population of a small town)? 100,000 infections (i.e., the population of a medium-sized city)? 1,000,000 infections (i.e., the population of a large city)? 340,000,000 infections (i.e., the population of the U.S.)?
EXERCISE 10.3: One benefit of disease spread models is that they educate the population about potential dangers and can lead to behavior changes that slow the spread of the disease. Perform an experiment using the software model from EXAMPLE 10.1 to observe the impact dropping the R0 during an outbreak.
Software models are powerful and versatile tools that are used extensively to study complex real-world systems. The disease spread model from the previous section, while simpler than most models used by researchers, clearly demonstrates some of the advantages and disadvantages of software models. Advantages include:
Software models also have disadvantages when compared with real-world experimentation.
While computer models can be extremely useful in studying real-world systems, they must be carefully examined to ensure that the simplifying assumptions made by the model do not invalidate the conclusions drawn from simulations using that model. They must also be tested thoroughly to ensure that software bugs do not invalidate the results.
The disease spread model from the previous section is classified as a deterministic model, since its output is entirely determined by its inputs. In other words, it will always produce the same results given the same inputs. Many real-world systems are too complex to be modeled deterministically. In such cases, factors that are difficult to quantify can be simulated probabilistically. For example, a stock market model might not be able to definitively say that a stock will rise in value if certain conditions hold, but it can say that there is a high probability based on past market behavior. Models that utilize probability are classified as nondeterministic since they may produce different results each time they are run.
The Web page in FIGURE 1 implements a nondeterministic model of disease spread. Unlike the deterministic model using R0, this model takes into account patient proximity, the likelihood of exposure leading to infection, and the length of a patient's contagious period. The population is modeled by a two-dimensional grid, with each cell in the grid representing a person. By default, the grid is 50x50, but this can be adjusted by the user with the slide bar that appears below the grid. A grid square can be displayed in 3 colors: white represents a healthy person, red represents an infected and contagious person, and gray represents a person who was infected but is no longer contagious. At the start of each simulation, a single patient at the center of the grid is assumed to be infected.
In addition to the grid size, the user can adjust two factors that control the behavior of the model. The infection probability defines how likely it is that a neighbor of an infected person will become infected. The default is 20%, meaning that each of the eight grid cells that are adjacent to an infectious (red) cell will have a 20% chance of becoming infected (red). The contagious period is the number of days that an infected person remains contagious after onset. By default, the contagious period is set at four days, meaning an infected person can infect those with whom they come in contact for four days, but after that they are no longer a risk to spread the disease. Both the infection probability and the contagious period can be adjusted using slide bars.
|
Grid size: 50 |
Disease Spread Simulator This simulation models the spread of a disease, starting with one infected patient on Day 0. Note: red cells represent infected and contagious patients, while gray cells represent recovered patients. Infection probability: 20% |
FIGURE 1. Nondeterministic Disease Spread Model.
EXERCISE 10.4: Experiment with the nondeterministic model in FIGURE 1. If you run several simulations using the same settings, are the resulting infection patterns different from each other? If so, how similar are they in terms of shape and overall percentage infected?
EXERCISE 10.5: Determine how sensitive this nondeterministic model is to changes in the infection rate and contagious period. For example, if you reduce the infection rate to 15%, does that significantly change the shape and the overall infection percentage of the resulting simulations? How about if you reduce the infection rate to 10%? How low would you need to reduce the infection rate for the spread to completely die out within the population?
Similarly, how does decreasing or increasing the contagious period affect the simulation results? How low would you need to reduce the contagious period for the spread to completely die out within the population?
Prior to 1998, collegiate and professional volleyball matches played throughout the world used the sideout scoring system, in which only the serving team could earn a point. If the receiving team won a rally, then a sideout occurred and that team was awarded the serve but not a point. The result was that individual sets in a match (played to 15 points) could be arbitrarily long, with stretches of alternating sideouts in which neither team scored any points. In 1998, the international governing body for volleyball, FIVB, changed the rules to utilize a new scoring system: rally scoring. Under rally scoring, the winner of every rally earns a point regardless of which team was serving. Many credit this rule change as making the matches more exciting and broadcast-friendly, leading to volleyball's increased popularity worldwide.
When making such a drastic revision to the rules of the sport, FIVB needed to be sure that the change would produce the results they wanted (more exciting matches with more consistent lengths) without negatively affecting competitive balance. While it might be possible to take several volleyball teams, have them play a series of sets under the new rules, and analyze the results, this approach has numerous drawbacks. To be statistically significant, an extremely large number of sets would have to be played to balance out lucky or unlucky streaks. Plus, factors such as team health, scheduling, and home vs. away records would have to be carefully considered in order to compare set results. Fortunately, computer simulations can address these challenges. Using computer programs, millions of sets can be simulated in seconds, and variables such as relative team strengths can be quantified.
It is interesting to note that, along with changing from sideout scoring to rally scoring, FIVB also increased the number of points needed to win a set from 15 points to 25 points. It makes sense that number of points would need to increase, since every rally earns a point regardless of the server. But why not play to 20 points or 30 points? By analyzing large numbers of simulations, FIVB determined that a set length of 25 maintained roughly the same competitive balance as sideout scoring to 15. For example, if a team had a 60% likelihood of winning a set against a particular opponent under 15-point sideout scoring, the simulations showed they would have very close to a 60% likelihood of winning using 25-point rally scoring. So, to maintain competitive balance, the 25-point set was adopted.
To model a volleyball set between two teams, we could consider many factors such as statistics on serve rate and kill percentage, home vs. road records, and experience levels of the players. Similar to how R0 reduced all the factors that contribute to the spread of a disease spread into a single number, we can likewise reduce all the competitive factors of a volleyball team into a single number. That number, the team's strength, will range from 1 to 100, with a higher number denoting a better team. The strength number measures how likely a team is to win any given point. So, if team1 has a strength of 80 and team2 has a strength of 40, then team1 will be twice as likely to win any given point. Likewise, if team1 has a strength of 60 and team2 has a strength of 50, then team1 will be 20% more likely to win a point.
Given strength numbers for the two teams, we can define a nondeterministic model that simulates a set point-by-point. The winner of each point is determined probabilistically, so that a team whose strength number is 20% higher should win the point 20% more often. To determine a point winner, the model will use the following algorithm:
strength1 and strength2 denote the strength numbers of the two teams.1..(strength1+strength2).To see why this works, consider the example where team1's strength is 80 and team2's strength is 40. Team1's strength is double team2's so the likelihood of team1 winning the point should be double that of team2. Since there are twice as many numbers in the range 1..80 as there are in 81..120, the chances of selecting a random number from 1..80 (and declaring team1 the winner) is twice as likely. Similarly, if the strengths are 60 and 50, then the likelihood of team1 winning a point is 20% higher than for team2 (since the range 1..60 is 20% larger than 61..110).
The HTML document in EXAMPLE 10.2 implements this nondeterministic volleyball set model. The team strengths and the set length (number of points needed to win) are entered by the user into number boxes so that simulations can easily be run under different conditions. When the button is clicked, the PlaySet function accesses and stores these numbers in variables. Two additional variables, score1 and score2, are initialized to keep track of the scores of the two teams. A while loop repeatedly simulates playing a rally (by calling PlayRally) and updating the score. The set continues as long as neither team has reached the number of points needed for a win.
The PlayRally function determines the winner of a rally using the algorithm described above. It generates a random integer from the range 1..(strength1+strength2) using the RandomInt function from the random.js library. If that number is less than or equal to strength1, then team1 is declared the winner and the function returns "team1". Otherwise, team2 is the winner and the function returns "team2". These return values are used by the PlaySet function to update the appropriate team's score.
EXERCISE 10.6: Confirm that the Web page from EXAMPLE 10.2 behaves as described by clicking the 'See in Browser' button and then clicking on the 'Simulate Set' button in the resulting page. Is it possible for a team to win by a single point? Run multiple simulations to see if this ever occurs.
In volleyball, as in many sports, a team must win by at least two points. If a team reaches 25 points but their opponent is at 24, the set continues. As a result, the condition for winning a set is more complex than just "first to reach 25." The winning team must reach 25 points AND be ahead by at least two points.
Since the test in a while loop defines the condition on which the loop continues, the while loop in our model must reverse the stopping condition. In other words, it must define the condition for which the game does NOT end. Clearly, if neither team has reached 25 points, then the game continues. However, the game also continues if the teams are within 1 point of each other. We can write this complex condition in JavaScript using the OR operator (||):
(score1 < 25 && score2 < 25) || Math.abs(score1-score2) < 2
The HTML document in EXAMPLE 10.3 updates the while loop test to use a condition of this form.
EXERCISE 10.7: Confirm that the Web page from EXAMPLE 10.3 behaves as described by clicking the 'See in Browser' button and then clicking on the 'Simulate Set' button in the resulting page.
If the two teams have equal strengths, it should follow that each team would have the same likelihood of winning a set. Set the teams strengths to the same number (it doesn't matter what number as long as they are equal) and simulate 10 sets to 25. Do the sets tend to be close? Does each team win roughly half of the sets played?
EXERCISE 10.8: Consider a situation where one team is slightly better than the other, say 10% stronger. Simulate 10 sets to 25 in which the team strengths are 66 and 60. List the final scores of the sets. Are the individual sets more one-sided? Does the better team win more sets?
EXERCISE 10.9: Next, consider a set in which one team is significantly better, say twice as strong. Simulate 10 sets to 25 between teams with strengths of 60 and 30. List the final scores of the sets. Are the sets heavily skewed in favor of the dominant team? What is the win total for each team?
Designer Secrets
In English, descriptions of repetitive tasks usually focus on the stopping condition for the loop — an activity repeats until something happens. Conversely, the loop test in a while loop specifies the condition under which the loop continues — an activity repeats as long as something continues to happen. Thus, when you define a test for conditional repetition, the condition you specify should be the exact opposite of the condition as you would state it in English. For example, we would normally describe the rules of a volleyball set in terms of the stopping condition: the set is over when one of the teams earns the needed number of points AND is ahead by at least two points. However, the test for a while loop requires the condition under which the set must continue — as long as neither team has earned the needed number of points OR (barring that) the two teams are within one point of each other.
With any events that involve randomness, it is essential to perform a large number of experiments to ensure statistically meaningful results. For example, if you flip a coin 10 times, it wouldn't shock you if you obtained 7 heads (70%). A few lucky head flips out of 10 can greatly skew the results. However, it should shock you if you flipped 70 heads out of 100 flips, or 700 out of 1,000. As the number of flips increases, streaks of heads and streaks of tails should balance out, converging on the expected result (50% of each).
This same pattern occurs when simulating events using a nondeterministic model. Exercise 10.7 asked you to simulate 10 volleyball sets between evenly matched teams using the page from EXAMPLE 10.3. With only 10 sets simulated, you might easily have obtained unbalanced results such as team1 winning 7 of the 10 sets. For a complex model such as this, you might need to perform hundreds or even thousands of simulations for the results to be consistent and accurate. Clearly, simulating 1,000 individual sets using this page would be tedious. A better alternative is to automate the simulation of repeated sets.
The HTML document in EXAMPLE 10.4 modifies the document from EXAMPLE 10.3 by adding a number box where the user can enter the number of sets to simulate. The PlaySet function is modified and a new function, PlayManySets, is added to perform the repeated simulations:
PlaySet function, the statements that displayed the score after each point are removed. When simulating a large number of sets, displaying the point-by-point score for all the sets would be overwhelming and would slow down the simulation considerably. PlaySet function. Since the function no longer displays the score, there needs to be some way for it to specify the winner of a set. The new if statement compares the scores of the two teams after the set is completed and returns a string identifying the winner (either 'team1' or 'team2').PlayManySets function has been added to simulate and maintain statistics on multiple sets. The function utilizes a while loop controlled by a counter (setsPlayed) to simulate the specified number of sets. Within the loop, the revised PlaySet function is called to simulate a set, and a counter (wins1) is incremented each time the function call returns 'team1'.PlayManySets function when clicked.
EXERCISE 10.10: Confirm that the Web page from EXAMPLE 10.4 behaves as described by clicking the 'See in Browser' button and then clicking on the 'Simulate Sets' button in the resulting page.
EXERCISE 10.11: Set the team strengths to 66 and 60, meaning that team1 is 10% better than team2.
If team1 has a strength rating that is 10% higher than team2, then it follows that team1 has a 10% higher chance of winning any given point. You might be tempted to think that a 10% advantage on a point would translate to a 10% advantage on winning a game. However, the truth is that even a small advantage compounds over the course of the game. In essence, the underdog has to beat the odds over and over in order to pull off a win. For a small difference in team strengths, this disadvantage may not be too difficult to overcome. When there is a large gap in team strengths, however, the odds can be insurmountable.
EXERCISE 10.12: Conduct simulations to determine the likelihood of team1 winning given the following team strengths (which range from 10% to 100% differences). Assume a set length of 25 and 10,000 sets per experiment.
team1 team2 winning %
strength strength for team1
----------------------------------
44 40
48 40
52 40
56 40
60 40
80 40
Describe the overall pattern shown by your results. As the difference between the team strengths increases, how does this change the likelihood of the better team winning? What does this suggest to you about the "fairness" of volleyball's scoring rules?
EXERCISE 10.13: In college volleyball in the U.S., the initial sets in a match are played to 25. If the teams are tied at two sets each, however, the deciding fifth set is played to 15. An interesting question is whether the shorter set favors either of the teams. Repeat the simulations from EXERCISE 10.12, only now with a set length of 15. Assume the number of sets for each experiment is still 10,000.
team1 team2 winning %
strength strength for team1
----------------------------------
44 40
48 40
52 40
56 40
60 40
80 40
Compare your results with those obtained in EXERCISE 10.12. Does the shorter set tend to favor either team? Is the effect consistent regardless of how closely matched the teams are, or is it heightened by differences in strength? Identify any pattern you see in the results and provide a possible rationale for why that pattern holds.
| ← C10: Inside Computers | ↑ Table of Contents ↑ | C11: Impacts of Computing → |