Playing with models.

Money does buy Olympic medals … squared!

Posted in Statistics by Alexander Lobkovsky Meitiv on September 20, 2012

Gold medalsAs I was watching the Jamaican sprinters sweep the 200m dash, I started wondering how such a relatively small and not wealthy by any stretch of imagination country could achieve such dominance.  Is there a correlation between the population of the country and its haul of medals?  Almost certainly.  Perhaps a more incendiary and more interesting question is “does money buy medals?”  Not in a literal sense, of course, but in a statistical sense.  Is there a correlation between the per-capita medal count and per-capita income?  There should be.  Money buys equipment, coaching and medical staff, transportation, etc.

As I embarked on this project, I expected to find a significant positive correlation.  But what I found was even more shocking.  Medal count per person grows as the square of per-capita income.  The graph below shows the medal count (obtained from http://www.london2012.com  and a Wikipedia article) divided by the population of each country (obtained from Wikipedia) vs. the purchasing parity GDP per capita (obtained from the CIA column of this Wikipedia article).  Only populous (>50,000,000) countries are included since statistical trends are more clear in large samples and the fluctuations that obscure these trends are smaller.  The straight line is the quadratic fit.

Why does the medal haul grow faster than linearly with the resources?  I have an explanation for this striking phenomenon which assumes that each sport has an entrance threshold s and that the distribution of these entrance thresholds is roughly uniform. If some contry has a GDP per capita that is greater than the entrance threshold for a particular sport, it enters competition. It follows that the number of competitors is inversly proportional to the entrance threshold of a sport. I further assume that all competitors are equally likely to get a medal once they enter competition. Therefore the number of medals each competitor wins is inversely proportional to the number of competitors and consequently it is proportional to the entrance threshold of the sport. The final logical step is to notice that a country with per-capita income s_0 competes in all sports whose entrance thresholds are \le s_0. Thus the total number of medal is proportional to

\displaystyle \int_0^{s_0} s\, ds \sim s_0^2.

Thus the quadratic dependence of the medal haul on the GDP per capita comes from the fact that richer contries enter more sports and it is easier to win medals in more expensive sports since not as many countries can enter.

Correlation between the medal haul and income per capita.

Money does buy Olympic medals, Squared! (the green line has slope 2)
Only countries with population greatern than 50 million are included in this plot. Ethiopia is a particularly striking outlier winning more than two orders of magnitude more medals than predicted by the green line.

Immunization in the age of intercontinental travel

Posted in Game theory, Statistics by Alexander Lobkovsky Meitiv on January 18, 2011

People in face masks in China during the flu epidemic

Face masks curtail the spread of the virus during the flu pandemic.

If you have kids in school, you are familiar with how fervently the school administrators enforce the 100% immunization policy. The schools are complying with the local laws which grant exceptions grudgingly. Childhood immunization is a powerful tool against a variety of crippling viruses some of which are extinct (outside the controlled lab environment) as a result of widespread immunizations. Controversy over the possibly harmful side effects (mercury, other preservatives) notwithstanding, is the zeal for reaching the 100% immunization rate justified?

The efficacy of a vaccination program is quantified by the fraction of the non-immunized population that gets sick in an epidemic. This fraction can also be thought as the probability that a particular individual will get sick in an epidemic.

A number of factors determine the probability of infection in an epidemic:

How long is the sick person contagious? What is the probability of infection given the contact with a sick person? What is the average rate of inter-personal contacts? How far does a person travel during the sickness? The answers to these questions depend on the type of virus, and the properties of the population such as its density and the patterns of movement.

The situation seems too complex for predictive modeling. Could a simplified model offer meaningful insight? Yes, if we pick a narrow aspect of the problem to look at. How about this? You probably heard the doomsday scenarios of a deadly virus spread around the world aboard airplanes. Is a this kind of talk just fear-mongering or a realistic prediction?

Let us construct a model to study whether the doomsday scenario is plausible. Let’s start with a 2D square lattice, or a board, whose sites (spaces) can be empty of occupied by “people” — let’s call them “entities.” The entities could be in three states: immune, vulnerable, and sick. The sick entities can infect the vulnerable but not the immune ones. We need to decide what to do with the sick entities. For example, some fraction of them can “die”–be removed from the board. The simplest thing is to just let them become immune after the disease has run its course. This is what is done in our model.

The entities can move around the board. The movement models the short range everyday movement of the population: commute, shopping, going to and from school, etc. I will use a turn based (like the Conway’s game of life) set of movement rules that are often used in simulating fluid-vapor interfaces. The result is a collection of dense clusters of varying sizes that float in a sparsely inhabited sea. There is little exchange of entities between the clusters. Since the infection is acquired on contact, global epidemics are impeded by the limited inter-cluster movement. One could think of these semi-isolated clusters as communities, cities, or even continents depending on your perspective.

Below is the movie of the model simulation in which the sick entities (red) infect the vulnerable entities (blue) and after a while become immune (green). A fraction of the population is already immune at the onset of the epidemic. Observe how the disease propagates quickly across the clusters and makes infrequent jumps between the clusters. In this particular simulation, 37% of the vulnerable population got sick before the epidemic fizzled out.

You probably noticed that the immunization rate in the above example is rather low, 30% to be exact. Since most entities are vulnerable, the epidemic has no trouble spreading. When the immunization rate is more than doubled to 70%, most epidemics fizzle out early. As you can see in the PDF (probability density function) plot below of the total epidemic size (defined as the fraction of the vulnerable population that go sick), all epidemics involve fewer than 4% of the populace. There is simply not enough population movement for the disease to spread.

PDF of the epidemic size in a model without large scale movement

When the movement is local and the immunization rate is high, most epidemics fizzle out without affecting many people.

Time to include airplanes and examine the plausibility of the doomsday scenario!

In addition to the short range movement, let’s allow at each turn a certain small fraction of the population to move anywhere on the board. The second graph below is the PDF of the epidemic size for the same parameters as the one above, but with the additional 5% of the population executing large scale movement each turn. Notice the radical change in the scale of the x axis. When a small fraction of the population travels long distances each turn, most epidemics grow to encompass the majority of the population. The bimodal nature of the epidemic size distribution suggests that there is a threshold size. If the epidemic hits a cluster that happens to be larger than the threshold, the disease can escape and infect almost all other clusters.

PDF of the epidemic size when macerate large scale movement is allowed

When only 5% of the population execute large scale movement every turn, most epidemics grow to affect a significant fraction of the population.

Let us now quantitatively examine the effect of the large scale movements on the probability of significant epidemics. In the graph below I will plot the probability of occurrence of an epidemic that involves > 10% of the vulnerable populace as a function of the immunization rate for two different magnitudes of the large scale movement. Significant epidemics become rare as the immunization rate increases. However, perhaps not surprisingly, greater immunization rate is required to avoid epidemics for a larger magnitude of large scale population movement.

Probability of a significant epidemic as a function of immunization rate

Greater large scale movement requires a higher immunization rate to avoid a significant epidemic

Predicting how epidemics spread in the real world is a tricky business. However, the general conclusion of the simple model, I think, will stand. While 100% immunization rate is not strictly required to stem epidemics, as the extent of long distance travel increases, we will need a higher immunization rate. It would be unwise to be lax about immunization requirements to discover one day that not enough of the population is immunized.

The real issue, I think, is that the small fraction of people who refuse to be immunized are shielded from infection by those who took the risk of immunization (albeit a small risk). But that is a can of worms, I don’t really want to open…

Should you switch lanes in traffic?

Posted in Statistics, Transportation by Alexander Lobkovsky Meitiv on June 24, 2010

Car switching lanes in traffic

Switching lanes in heavy traffic can indeed increase your average speed if done right.


If you drive like me, you have no patience for bumper to bumper traffic. There is gotta be a way to beat it somehow, right? Do you sneak into an opening in a neighboring lane if it is moving faster? Do you set goals like: “when I get in front of that van, I’ll switch back?” It doesn’t always seem to work. A lane that was zooming by you comes to a dead stop when you switch into it. If the motion of each lane is random, is there a way to switch lanes and move faster than a car that stays in lane?

It turns out there is a way to beat the traffic. To show this we will use a simple model of traffic flow introduced by Nagel and Schrekenberg (see the previous post). The model consists of a circular track with consecutive slots which can be empty of occupied by cars. Cars have an integer velocity between 0 and vmax. As we saw in the previous post, simple rules for updating the positions and velocities of the cars can reproduce the traffic jam phenomenon thereby a dense region forms in which the cars are at a standstill for a few turns and then, as the jam clears in front of them, the cars accelerate and zoom around the track only to be stuck in the jam again. The jam itself moves in the direction opposite to that of the cars.

Now imagine that we put two of the circular tracks (or lanes) side by side. For starters, let’s require all cars except one to stay in their respective lanes. One rogue car can switch lanes. Can the rogue with the right lane switching strategy move faster than the rest of the cars on average? The answer is most certainly yes although finding the best lane switching strategy is a difficult computational problem. What we are going to do here is compare two lane switching strategies that at first sight seem equally good. What we will discover is that it the lane changing strategy matters. As you might have suspected, if you don’t do it right, you might actually move slower than the rest of the traffic!

Here are the two simple strategies we will compare (I suggest you read the previous post for the description of the model):

1) “Stop-switch:” if the slot directly ahead is occupied, switch if the space in the other lane directly across is not occupied.
2) “Faster-switch:” if the car directly ahead in the neighboring lane is moving faster, switch if there is space available.

Graph of the percent improvemnt of the average speed of the lane changing car as a function of the car density

The graph above compares the two strategies. It shows the percent improvement of the rogue’s average speed compared to the average speed of the rest of the cars as a function of the car density. When density is low and traffic jams are rare, switching lanes has almost no effect on your average speed for both strategies. When the density is high and traffic jams are abound, switching can make you go slower than the rest of the traffic. The reason is that when a space in the neighboring lane opens up, it is likely to be at the tail end of a jam whereas the jam in the lane you just switched out of might be already partially cleared. The final remark is that the “Stop-switch” strategy is significantly better improving the speed by as much as 35% whereas the best “Faster-switch” can do is a 15% improvement.

Finally let me mention that if all cars switch lanes and use the same strategy, nobody wins. All cars move with the same speed on average. That average speed could be smaller or larger (depending on the car density and the switching strategy) than in the case when everybody says in lane. The graph below explains why everyone is so keen on the advice “Stay in lane!” It turns out that if everyone uses the “Faster-switch” strategy, the average speed is drastically lower for everyone than if everyone stays in lane! The reason for this dramatic result is that when you change lanes, the car behind is likely to slam on the brakes which slows everyone down.

Graphs of the average speed vs car density for two cases: everyone switches lanes using the "Faster-switch" strategy and everyone stays in lane.

People are the real cause of the traffic jam

Posted in Statistics, Transportation by Alexander Lobkovsky Meitiv on May 12, 2010

5 lanes of highway bumper to bumper

Sometimes traffic slows to a crawl for no apparent reason

When the traffic on the beltway is moving at a snail’s pace without an obvious reason (like construction or accident), I frequently wonder: “why can’t everyone just go faster?” If all car’s were driven by computers that could talk to each other, a clever synchronization algorithm, could allow all cars travel in unison and thus prevent congestion that is not a result of lane closure. Alas, this technotopia is still decades away and cars will be driven by humans for the foreseeable future. In the meantime we can but wonder: “what is it about the way people drive that causes the traffic to jam when the density of cars becomes too great?

Traffic flow is frequently studied because it is an example of a system far from equilibrium. The practical applications are important as well. Many models from crude to sophisticated have been advanced. Massive amounts of data exist and are frequently used to estimate model parameters and make predictions. I am not going to attempt to review the vast field here. My goal is simply to elucidate the physiological limitation of the human mind that causes the driving patters leading to congestion.

Although great progress has been made in modeling traffic as a compressible fluid, a class of models that fall into the category of Cellular Automata are more intuitive and instructive.

Cellular Automata, promoted by Stephen Wolfram of Mathematica fame as the solution to all problems, are indeed quite nifty. It turns out that autonomous agents, walking on a lattice and interacting according a simple set of rules can reproduce a surprising variety of observed macroscopic phenomena. If you want to learn more the Wikipedia article is a good start.

A pioneering work of Nagel and Schreckenberg published in Journal de Physique in 1992 introduced a simple lattice model of traffic which reproduced the traffic jam phenomenon and came to a surprising conclusion that the essential ingredient was infrequent random slowdowns.

You have probably done so yourself, you change the radio station or adjust the rear view mirror, or speak the child in the seat behind you. As you do so, your foot eases off the accelerator ever so slightly irritating the person behind you who has to disengage the cruise control. You and people like you are responsible for the traffic jams when the volume is heavy but there are no obvious obstructions to traffic.

Allow me to reproduce the authors’ description of the model since it is concise and elegant:

“Our computational model is defined on a one-dimensional array of L sites and with open or periodic boundary conditions. Each site may either be occupied by one vehicle, or it may be empty. Each vehicle has an integer velocity with values between zero and vmax. For an arbitrary configuration, one update of the system consists of the following four consecutive steps, which are performed in parallel for all vehicles:

  1. Acceleration: if the velocity v of a vehicle is lower than vmax and if the distance to the next car ahead is larger than v + 1, the speed is increased by one.
  2. Slowing down (due to other cars): if a vehicle at site i sees the next vehicle at site i + j (with j < v), it reduces its speed to j.
  3. Randomization: with probability p, the velocity of each vehicle (if greater than zero) is decreased by one.
  4. Car motion: each vehicle is advanced v sites.”

Without the randomizing step 3) the motion is deterministic: “every initial configuration of vehicles and corresponding velocities reaches very quickly a stationary pattern which is shifted backwards (i.e. opposite the vehicle motion) one site per time step.”

The model exhibits the congestion phenomenon when the mean spacing between the cars is smaller then vmax.

Below are the links to the simulations of the model for a circular track with 100 lattice sites, the cars are colored circles which move along the track. It helps to follow a particular color car with your eyes to see what’s happening.
The two simulations are done with 15 cars (density lower than critical) and with 23 cars (above the critical density–exhibits congestion). As you probably guessed vmax=5 in these simulation hence 20 cars correspond to the critical density. The probability of random slowing down is 10% per turn.

Free flowing traffic in a simulation of the Nagel-Schreckenberg model below the critical density threshold.

The second simulation (above the critical density) shows the development of a jam of 5 cars. Cars zoom around the track and then spend 5 turns not moving at all, before the traffic clears ahead of them and they can accelerate to full velocity again.

The moral of the story? People like you and me can be the cause of traffic congestion!

It’s true: Cubs choke while Yankees surge.

Posted in Sports, Statistics by Alexander Lobkovsky Meitiv on March 24, 2010

1969 Chicago Cubs Photo with autographs

The infamous chokers: 1969 Chicago Cubs

The fans of Chicago Cubs know this scenario all too well. The Cubs, a great team, have a decent season only to slump and choke at the end. You only have to google the words “Cubs choke” to come up with dozens of websites lamenting the numerous heartbreaks. Cubs fans have developed a kind of fatalistic gloom as a coping strategy.

Then there are the Yankees everyone loves to hate. They seem to elevate their game at the end of the season and transform from a good team to hall of fame greatness. It’s as if all season long they weren’t giving it all they got during the regular season. It seems that they play just well enough to get into the postseason only to turn on the afterburners and blow everyone away.

Are these notions fiction perpetrated by fans or fact based on evidence?

We are in a position to test these hypothesis using a scientifically sound ELO ranking system. Using publicly available match data I compiled ELO ratings for all baseball teams going back to 1874. To refresh your memory, an ELO rating is a number which measures the true strength of a team based on all previous games. It is supposed to track the current strength accurately and in an unbiased manner.

The ratings of the 1977 Yankees and the 1969 Cubs

Variation of the ELO rating of the Cubs and Yankees during the 1969 and 1977 seasons.


The graph of the infamous 1969 Cubs choke and the Yankees 1977 season in which they won the World Series after being 51-44 in July and ranked #3 seems to support the “Cubs choke Yankees surge” hypothesis.

Is this true in general or is it just a lucky or unlucky break?

Well, here is where the data analysis can fully demonstrate its magic. The numbers don’t lie. If whoever does the numbers doesn’t that is.

What I did is to compute the difference between the rating of each team at the end of the season and its rating on September 1st of the same season. I then averaged this late season rating change over the last 47 years (since the 1961 expansion of the leagues from 16 to 20 teams). I then tested the result against the hypothesis that the rating change is purely random. This test weeded out the teams whose late season rating change could have resulted from purely random rating fluctuations. The remaining teams’ late season change is statistically significant and therefore not a fluke.

The result clearly supports the “Cubs choke, Yankees surge hypothesis.”

Bargraph of the average late season rating change.

The average late season rating change for 9 teams whose rating change is statistically significant.

Legend:
ANA: Angels
CHW: White Sox
OAK: Athletics
STL: Cardinals
ATL: Braves
TEX: Rangers
CHC: Cubs
NYY: Yankees
DET: Tigers

Basketball Time Machine

Posted in Sports, Statistics by Alexander Lobkovsky Meitiv on March 10, 2010
1986 Celtics

1986 Celtics

1997 Bulls

1997 Bulls

What would you do with a time machine? I bet some people would be chomping at the bit to pit two dominant teams from different eras against each other and have a grand old spectacle!
But alas, it is safe to say that a time machine will remain for the foreseeable future in the realm of magic.

Can we get a glimpse at what the outcome of such a magical game might be? Is there a scientifically sound way to rate sports teams in a way that judges their true strength. Most importantly, we need a method that yields ratings whose scale does not change with time so that a team that gets a rating of 2000 thirty years ago is as strong (in some sense) as a team that gets a rating of 2000 today.

We are indeed in luck! Such a system exists. It was proposed in the 1950’s by a Hungarian mathematician Arpad Elo (read about him on Wikipedia) and bears his name. His system is based on sound mathematical theory and ever since then dozens upon dozens of mathematical papers have been proving how reliable and reasonable the system is. Although Elo originally proposed his system to rate chess players, it has been adopted by a number of other sports bodies including FIDE, FIFA, MLB, EGF and others.

At the core of the ELO system is the ranking updating scheme which adjusts the ranking of the two teams (or players) after each match depending on the result. Given the rankings before the game, one can compute the probability of each outcome given that the actual performance has a certain probability distribution. If the stronger team wins its rating increases by a smaller amount than if the weaker team wins. There are many different specific incarnations of the system. While some are more accurate than others, even in its simplest form, the system is quite useful. In fact using publicly available match data we can resolve the question:

If 1997 Chicago Bulls played a best of 7 series against the 1986 Boston Celtics, what are the chances of each team winning?

After downloading the match data (56,467 games over 64 years that involved a total of 53 franchises some of which changed names and cities a number of time) and computing the rating history I came up with the top ten highest rated franchises:

Rank Team Year achieved Rating
1 Chicago Bulls 1997 2233.7
2 Boston Celtics

1986 2184.9
3 Los Angeles Lakers 1988 2163.3
4 Philadelphia 76ers 1983 2149.2
5 Detroit Pistons 1990 2137.4
6 Utah Jazz 1999 2129.9
7 Dallas Mavericks 2007 2126.5
8 San Antonio Spurs 2007 2089.4
9 Milwaukee Bucks 1971 2081.6
10 Seattle Supersonics 1996 2076.5

It is a telling sign that the NBA is a competitively healthy organization that the top 10 all time high ranking teams of all time pretty close to each other in rating. Also, it seems at least superficially, that there is no historical bias meaning the objective meaning of a rating does not change with time.

So, what would happen if the 1997 Bulls played a best of 7 series against the 1986 Celtics?
Home field advantage aside (the ranking I am using does not take that into account), the probability of the Bulls winning any particular game is . The probability of winning a best of 7 series (below )

The Bulls would have a 57.5% chance of winning the series: an exiting spectacle indeed!

Finally I leave you with a graph of the historical ratings of six teams from large metropolitan areas from 1980 to present day. It seems that it is extremely difficult to maintain a dominant team for more than a few seasons (although the Lakers managed to do so in the 1980’s).

NBA ELO ratings graph

Historical season ending ELO ratings for six NBA teams from large metropolitan areas

Unavoidable Attraction

Posted in Statistics, Transportation by Alexander Lobkovsky Meitiv on March 2, 2010

Gridlock

When traffic is heavy, buses tend to form hard to breakup bunches.

Everyone who rides buses in a city is familiar with the dreaded “bus bunching” phenomenon. Especially during rush hour, buses tend to arrive in bunches of two, three or even more. Why is that?

To begin understanding this phenomenon we must first assimilate the notion of fluctuations. The bus’s progress along the route, though ideally on schedule, in practice is not. At each stop there is a difference between the actual and the scheduled arrival time. The nature of the fluctuations is such that this difference tends to grow along the route of the bus. In technical terms, the bus’s trajectory is called a directed random walk. There are several sources of the fluctuations: stop lights, variation in the number of passengers to be picked up and discharged and, of course, traffic.

When fluctuations are strong, and/or, the buses are frequent, it is unavoidable that consecutive buses find themselves at the same bus stop. What happens afterwards is less clear cut. It seems that it is virtually impossible for the buses to separate again. From that point on the two (or more) buses travel in a bunch. The average speed of a bus bunch is frequently greater than that of an isolated bus and therefore bunches tend to overtake and absorb buses that are ahead.

Let’s try to come up with a plausible explanation for the two observed phenomena: Why do bus bunches not break up naturally? Why is the average speed of the bunch different from the average speed of an isolated bus?

Let’s tackle the questions one at a time. When don’t bunches break up? There could be several reasons. Without real field data, I am afraid, we won’t be able to say for certain which factor is the most important.

Possible reason #1: Excluded volume interactions. Analogy with colloids.

Colloids are suspensions of small solid particles in a fluid. It is a well know phenomenon, readily reproducible in a lab, that when you combine colloidal particles of two substantially different sizes, they tend to separate even if the particles themselves are not attracted to each other. It may be counterintuitive, but the system can increase its entropy by separating particles by size. Once a small particle escapes from the aggregate of large particles, it is extremely unlikely to make it back there.

The same size separation might happen in traffic, although likely for different reasons. How much do you like being sandwiched between a bus and a dump truck? You try to get the hell out of there at the first opportunity.

So spaces between traveling buses may be unlikely to be filled up with cars. In a sense, there is an effective attraction between buses cased by the car’s avoidance of the space between them.

One would certainly need data to support or reject the excluded volume hypothesis of bus attraction.

Possible reason #2: Correlation between the number of waiting passengers and the distance to the nearest bus ahead.

Now this idea is something we could sink our teeth into. Suppose that the gap between two buses shrinks due to a random fluctuation of unspecified nature. Then, the mean number of passengers waiting for the second bus, which is proportional to the wait time (if the passengers arrive at the bus stop at a fixed rate), also decreases. Therefore the second bus will spend less time picking up passengers, it’s mean velocity will therefore increase and it will catch up with the bus ahead. We can therefore say that the state with evenly spaced buses along the route is unstable to collapse.

This idea can be formalized in the following simple toy model.

Suppose there is a circular route with equidistant stops (a linear route is really circular if the buses turn around at the end of the route and go back immediately). Initially a number of buses are uniformly distributed along the route. Passengers arrive at all bus stops at a fixed rate. The time a bus spends at a stop is proportional to the number of passengers waiting there.

Passenger discharge can be included in the model. However it does not qualitatively affect the results.

There are two important parameters in this model: 1) the product of the travel time between stops and the rate of passenger arrival. This parameter determines whether the bus spends most of its time traveling or picking up passengers. 2) The ratio of the number of stops to the number of buses.

It turns out that if the first parameter is large (most time is spent traveling) or the second parameter is small (there are lots of buses), bunching does not occur.

However, as illustrated in the figure below, there is a realistic parameter range in which bunching does occur and bunches have no chance to break up. In the figure below (which presents the output of the simple model above), the three buses were initially well spaced. Eventually, buses 1 and 2 form a bunch which catches up to bus 3.

Once the bunch of two buses is formed, the buses leapfrog each other and pick up passengers at alternating stops. Here is therefore the answer to our second question why bunches travel faster: each bus only has to accelerate/decelerate less frequently since they only stop at every other stop. Hence the average speed is greater.

It would be fun to go out there and time some bus arrivals to see if they can be well described by the model. Any takers?

Graph illustrating the bunch formation

Correlation between the space between buses and the number of waiting passengers results in the bunching behavior.

Decisions, decisions, decisions…

Posted in Statistics, Transportation by Alexander Lobkovsky Meitiv on February 25, 2010

This entry is about how the amount of information at the time of a decision can increase the efficacy of the outcome.

The specific case I will talk about is public transport.

Have you ever been on a bus that sat at a red light only to stop again at a bus stop right after passing the intersection?
Did you wonder if it would be better to have the bus stop located before the light?

Wonder no more! If you read on, we will answer this question and a few others using simple statistics and a few carefully chosen assumptions.

Let us first compute the average waiting time at a red light. Let’s say the light has only two states: red and green which alternate. The durations of the red and green lights are fixed and are t_r and t_g. Suppose that the bus arrives at a light at a random time. Then its average waiting time at the red light is

\displaystyle t_\ell=\frac{1}{t_r+t_g}\int_0^{t_r}t\,dt=\frac{t_r^2}{2(t_r+t_g)}.

This is because we assume that the bus arrives at the light at a random time. Without any prior information, the distribution of arrival times is uniform. The behavior of the light is periodic with period t_r+t_g and thus the probability of arriving in any time interval dt is dt/(t_r+t_g).

For example, if the red and the green lights are equally long, i.e. t_g=t_r, the average wait at a stop light will be a quarter of the red light duration t_\ell = t_r/4. (To derive that substitute t_g=t_r into the equation above).

Now lets add the bus stop to the equation. We will assume that the bus stops for a fixed time t_s. Fluctuations in the stopping time can be added to the model. However, calculations become a bit more involved and the result does not change qualitatively.

The questions are: 1) What is the total stoppage time t_w: red light + bus stop? 2) Does it depend on whether the bus stop is before or after the red light?

If we know anything about information theory, our answer to the second question is NO without doing any algebra. Why? Because the bus arrival time is random and uncorrelated with the timing of the stop light. There is no information that can distinguish stopping before and after the intersection. If the stop is after the light, the bus has the wait at the red light for a time t_{\ell} just computed above. If the bus stops before the light, the “arrival” time is the time at the end of the stop and it is just as random as the arrival to the stop. Therefore, the average total stoppage time is just t_\ell + t_s regardless of whether the stop is before or after the light.

How can the total stoppage time be reduced?

After all this post is about efficiency of mass transit. The answer, again from the point of view of information theory, is the following. To improve efficiency, we must use available information to make decisions which make the arrival (or departure) time of the bus correlated with the timing of the light.

In Switzerland, public support for mass transit is so strong, that people accept that the trolleys actually change the timing of the stop lights to speed up passage at the expense of cars. Here in America this approach may not fly. However, even if the timing of the stop light cannot be changed by the bus/trolley driver, they still have the power to make decisions that would change the total stoppage time.

In the example above, the bus stop was always before or after the intersection. Suppose the driver could decide, based on some information about the phase of the stop light, whether to stop before or after the intersection?

Let’s call the scenario in which the bus driver does not make a decision where to stop the “null model” or the “no-decision” model. As a better alternative consider the “red-before” scenario in which the driver stops before the intersection if the bus arrives on the red light and after the intersection if the bus arrives on the green light. What is the average stopping time t_w = t_s + \Delta?

I am not going to bore you with the tedious derivation. The result itself is a bit complicated as we have to consider 4 separate cases. I am going to give a formula for the extra waiting time \Delta on top of the regular stop duration t_s.

Let’s first define:

\displaystyle I_1=\frac{(t_r-t_s)^2}{2(t_r+t_g)},

and

\displaystyle I_2=\frac{(t_s-t_g)(t_r-\frac{1}{2}(t_s-t_g))}{t_r+t_g}.

Then the extra waiting time is

\Delta=0 for t_r \le t_s \le t_g
\Delta=I_1 for 0\le t_s \le \min(t_r,t_g)
\Delta=I_2 for \max(t_r,t_g)\le t_s \le t_r+t_g
\Delta=I_1 + I_2 for t_g \le t_s \le t_r

If t_s \ge t_r + t_g, just replace t_s everywhere with its remainder when divided by t_r + t_g.

To illustrate these formulas here is the graphs comparing the extra stoppage time \Delta for the “no-decision” and the “red-before” scenarios as a function of the stop duration t_s for two different ratios of the red to green light durations.

Extra waiting time for t_r = 2 t_g

Comparison of the extra waiting time as a function of the bus stop duration for two different decision scenarios and the red light twice as long as the green light.

Extra waiting time for t_g = 2 t_r

Comparison of the extra waiting time for the green light twice as long as the gred light. Note that for a certain range of bust stop durations, the extra waiting time vanishes completely!

The “red-before” scenario which uses only the information about the current state of the stop light does quite well compared to the “no-decision” scenario. When the green light is longer than the red light, the extra waiting time vanishes altogether if the stop duration is chosen properly.

Can we do better?

Yes! The more information is available to the driver, the better can the strategy be for making the decision where to stop. We can imagine, for example, that when the bus arrives at a red light, the driver knows when it will turn green again. Or, the driver can have complete information and also know the duration of the following green light.

Let us compute the extra waiting time for the best stopping strategy with complete information. How much better does it do than the “red-before” strategy which uses only the information about the current state of the stop light? The best stopping strategy which uses all available information is the following. Suppose the bus arrives on a red light. The time till the light change is the extra waiting time if the driver decides to stop after the intersection. This time needs to be compared to the extra waiting time which might result if the bus stops before the intersection. This might happen if the total stop duration is longer than the remainder of the red light plus the following green light so that the light is red again after the bus stop is completed. The best decision will depending on when the bus arrives, the duration of the red and green lights and the the bus stop.

I am going to leave you with a comparison of the extra waiting time for the”red-before” strategy with the best stopping strategy with perfect information about the phase of the stop light (length of red, green, time till change).

The moral of the story: “Information is power!”

Extra waiting time for the

The perfect information helps reduce the extra waiting time when the red light is longer than the green and when the bus stop duration is longer than the that of the green light.