This post is a digression. It’s not about constructing mathematical models of real world phenomena. It’s about sequences of numbers like the famous Fibonacci sequence and the Android pattern unlock screen shown here.
The rationale for a graphical pattern instead of a numeric PIN is clear. Humans are much better at remembering patterns than strings of digits. But is the pattern as secure? Other issues aside, such as traceability of fingerprint smudges that allows an attacker to recover the pattern, are there as many patterns as there are numeric PINs of the same length?
Well that’s where understanding recursion relations is useful. A recursion relation defines a sequence of numbers by a relationship between and all preceding sequence members. For example the Fibonacci sequence is defined via
Depending on the type of recursion relation (whether it is linear, for example), there are a variety of methods for solving them. Let’s use the power of recursion to find the number of possible Android unlock patterns that connect dots.
To make the problem seemingly more complicated (but really simpler) let’s separately compute the number of patterns that start at the center, side and corner of the 3×3 grid of dots. Let’s call those numbers and The total number of unlock patterns is
Because there are 4 corner and 4 side dots in the 3×3 grid.
Let’s now derive a recursion relation for The first step is to note that when the pattern consists of only one dot (that would not be a very secure pattern) we trivially get and
Now let’s imagine that we managed to compute and for some We can then compute noticing that the very first link that is made from the center dot can only go to a side dot (4 possible ways) or to a corner dot (also in 4 possible directions). Therefore
When starting from the side dot, there are 4 ways to get to a corner dot, one way to get to the center dot and 2 ways to get to other side dots, therefore
And finally when starting from a corner dot, there are 4 ways to get to a side dot and one way to get to the center dot, thus
The above three relationships can be summarized as
using matrix notation by defining a vector and a matrix
Hang on tight, we are almost there. The simple recursion relation (1) means that we can obtain the numbers of patterns connecting dots from those for dots by just applying the matrix Applying this reasoning recursively we arrive at
I won’t bore you with the rest of the calculation which is quite routine. As my eccentric math tutor used to say: “After this point it’s just algebra!”
Here is the result of the calculation:
|N||Number of patterns connecting N dots||Number of numeric PINs of N digits|
Clearly, the number of patterns grows slower than the number of numeric PINs. This may not matter, however, if patterns are easier to remember and therefore one can comfortably have a longer pattern.
The only sure way to determine whether it is indeed easier to remember patterns, is through careful experiments with human subjects. Until such data are available, all statements about memorability and security of patterns vs PIN’s are pure speculation…
The restriction that a dot cannot be used more than once in a pattern *dramatically* reduces the number of allowed patterns. Please see comments for the true number of unlock patterns.
What would you do with a time machine? I bet some people would be chomping at the bit to pit two dominant teams from different eras against each other and have a grand old spectacle!
But alas, it is safe to say that a time machine will remain for the foreseeable future in the realm of magic.
Can we get a glimpse at what the outcome of such a magical game might be? Is there a scientifically sound way to rate sports teams in a way that judges their true strength. Most importantly, we need a method that yields ratings whose scale does not change with time so that a team that gets a rating of 2000 thirty years ago is as strong (in some sense) as a team that gets a rating of 2000 today.
We are indeed in luck! Such a system exists. It was proposed in the 1950’s by a Hungarian mathematician Arpad Elo (read about him on Wikipedia) and bears his name. His system is based on sound mathematical theory and ever since then dozens upon dozens of mathematical papers have been proving how reliable and reasonable the system is. Although Elo originally proposed his system to rate chess players, it has been adopted by a number of other sports bodies including FIDE, FIFA, MLB, EGF and others.
At the core of the ELO system is the ranking updating scheme which adjusts the ranking of the two teams (or players) after each match depending on the result. Given the rankings before the game, one can compute the probability of each outcome given that the actual performance has a certain probability distribution. If the stronger team wins its rating increases by a smaller amount than if the weaker team wins. There are many different specific incarnations of the system. While some are more accurate than others, even in its simplest form, the system is quite useful. In fact using publicly available match data we can resolve the question:
If 1997 Chicago Bulls played a best of 7 series against the 1986 Boston Celtics, what are the chances of each team winning?
After downloading the match data (56,467 games over 64 years that involved a total of 53 franchises some of which changed names and cities a number of time) and computing the rating history I came up with the top ten highest rated franchises:
|3||Los Angeles Lakers||1988||2163.3|
|8||San Antonio Spurs||2007||2089.4|
It is a telling sign that the NBA is a competitively healthy organization that the top 10 all time high ranking teams of all time pretty close to each other in rating. Also, it seems at least superficially, that there is no historical bias meaning the objective meaning of a rating does not change with time.
So, what would happen if the 1997 Bulls played a best of 7 series against the 1986 Celtics?
Home field advantage aside (the ranking I am using does not take that into account), the probability of the Bulls winning any particular game is . The probability of winning a best of 7 series (below )
The Bulls would have a 57.5% chance of winning the series: an exiting spectacle indeed!
Finally I leave you with a graph of the historical ratings of six teams from large metropolitan areas from 1980 to present day. It seems that it is extremely difficult to maintain a dominant team for more than a few seasons (although the Lakers managed to do so in the 1980’s).
To begin understanding this phenomenon we must first assimilate the notion of fluctuations. The bus’s progress along the route, though ideally on schedule, in practice is not. At each stop there is a difference between the actual and the scheduled arrival time. The nature of the fluctuations is such that this difference tends to grow along the route of the bus. In technical terms, the bus’s trajectory is called a directed random walk. There are several sources of the fluctuations: stop lights, variation in the number of passengers to be picked up and discharged and, of course, traffic.
When fluctuations are strong, and/or, the buses are frequent, it is unavoidable that consecutive buses find themselves at the same bus stop. What happens afterwards is less clear cut. It seems that it is virtually impossible for the buses to separate again. From that point on the two (or more) buses travel in a bunch. The average speed of a bus bunch is frequently greater than that of an isolated bus and therefore bunches tend to overtake and absorb buses that are ahead.
Let’s try to come up with a plausible explanation for the two observed phenomena: Why do bus bunches not break up naturally? Why is the average speed of the bunch different from the average speed of an isolated bus?
Let’s tackle the questions one at a time. When don’t bunches break up? There could be several reasons. Without real field data, I am afraid, we won’t be able to say for certain which factor is the most important.
Possible reason #1: Excluded volume interactions. Analogy with colloids.
Colloids are suspensions of small solid particles in a fluid. It is a well know phenomenon, readily reproducible in a lab, that when you combine colloidal particles of two substantially different sizes, they tend to separate even if the particles themselves are not attracted to each other. It may be counterintuitive, but the system can increase its entropy by separating particles by size. Once a small particle escapes from the aggregate of large particles, it is extremely unlikely to make it back there.
The same size separation might happen in traffic, although likely for different reasons. How much do you like being sandwiched between a bus and a dump truck? You try to get the hell out of there at the first opportunity.
So spaces between traveling buses may be unlikely to be filled up with cars. In a sense, there is an effective attraction between buses cased by the car’s avoidance of the space between them.
One would certainly need data to support or reject the excluded volume hypothesis of bus attraction.
Possible reason #2: Correlation between the number of waiting passengers and the distance to the nearest bus ahead.
Now this idea is something we could sink our teeth into. Suppose that the gap between two buses shrinks due to a random fluctuation of unspecified nature. Then, the mean number of passengers waiting for the second bus, which is proportional to the wait time (if the passengers arrive at the bus stop at a fixed rate), also decreases. Therefore the second bus will spend less time picking up passengers, it’s mean velocity will therefore increase and it will catch up with the bus ahead. We can therefore say that the state with evenly spaced buses along the route is unstable to collapse.
This idea can be formalized in the following simple toy model.
Suppose there is a circular route with equidistant stops (a linear route is really circular if the buses turn around at the end of the route and go back immediately). Initially a number of buses are uniformly distributed along the route. Passengers arrive at all bus stops at a fixed rate. The time a bus spends at a stop is proportional to the number of passengers waiting there.
Passenger discharge can be included in the model. However it does not qualitatively affect the results.
There are two important parameters in this model: 1) the product of the travel time between stops and the rate of passenger arrival. This parameter determines whether the bus spends most of its time traveling or picking up passengers. 2) The ratio of the number of stops to the number of buses.
It turns out that if the first parameter is large (most time is spent traveling) or the second parameter is small (there are lots of buses), bunching does not occur.
However, as illustrated in the figure below, there is a realistic parameter range in which bunching does occur and bunches have no chance to break up. In the figure below (which presents the output of the simple model above), the three buses were initially well spaced. Eventually, buses 1 and 2 form a bunch which catches up to bus 3.
Once the bunch of two buses is formed, the buses leapfrog each other and pick up passengers at alternating stops. Here is therefore the answer to our second question why bunches travel faster: each bus only has to accelerate/decelerate less frequently since they only stop at every other stop. Hence the average speed is greater.
It would be fun to go out there and time some bus arrivals to see if they can be well described by the model. Any takers?
This entry is about how the amount of information at the time of a decision can increase the efficacy of the outcome.
The specific case I will talk about is public transport.
Have you ever been on a bus that sat at a red light only to stop again at a bus stop right after passing the intersection?
Did you wonder if it would be better to have the bus stop located before the light?
Wonder no more! If you read on, we will answer this question and a few others using simple statistics and a few carefully chosen assumptions.
Let us first compute the average waiting time at a red light. Let’s say the light has only two states: red and green which alternate. The durations of the red and green lights are fixed and are and Suppose that the bus arrives at a light at a random time. Then its average waiting time at the red light is
This is because we assume that the bus arrives at the light at a random time. Without any prior information, the distribution of arrival times is uniform. The behavior of the light is periodic with period and thus the probability of arriving in any time interval is
For example, if the red and the green lights are equally long, i.e. the average wait at a stop light will be a quarter of the red light duration (To derive that substitute into the equation above).
Now lets add the bus stop to the equation. We will assume that the bus stops for a fixed time Fluctuations in the stopping time can be added to the model. However, calculations become a bit more involved and the result does not change qualitatively.
The questions are: 1) What is the total stoppage time : red light + bus stop? 2) Does it depend on whether the bus stop is before or after the red light?
If we know anything about information theory, our answer to the second question is NO without doing any algebra. Why? Because the bus arrival time is random and uncorrelated with the timing of the stop light. There is no information that can distinguish stopping before and after the intersection. If the stop is after the light, the bus has the wait at the red light for a time just computed above. If the bus stops before the light, the “arrival” time is the time at the end of the stop and it is just as random as the arrival to the stop. Therefore, the average total stoppage time is just regardless of whether the stop is before or after the light.
How can the total stoppage time be reduced?
After all this post is about efficiency of mass transit. The answer, again from the point of view of information theory, is the following. To improve efficiency, we must use available information to make decisions which make the arrival (or departure) time of the bus correlated with the timing of the light.
In Switzerland, public support for mass transit is so strong, that people accept that the trolleys actually change the timing of the stop lights to speed up passage at the expense of cars. Here in America this approach may not fly. However, even if the timing of the stop light cannot be changed by the bus/trolley driver, they still have the power to make decisions that would change the total stoppage time.
In the example above, the bus stop was always before or after the intersection. Suppose the driver could decide, based on some information about the phase of the stop light, whether to stop before or after the intersection?
Let’s call the scenario in which the bus driver does not make a decision where to stop the “null model” or the “no-decision” model. As a better alternative consider the “red-before” scenario in which the driver stops before the intersection if the bus arrives on the red light and after the intersection if the bus arrives on the green light. What is the average stopping time ?
I am not going to bore you with the tedious derivation. The result itself is a bit complicated as we have to consider 4 separate cases. I am going to give a formula for the extra waiting time on top of the regular stop duration
Let’s first define:
Then the extra waiting time is
If just replace everywhere with its remainder when divided by
To illustrate these formulas here is the graphs comparing the extra stoppage time for the “no-decision” and the “red-before” scenarios as a function of the stop duration for two different ratios of the red to green light durations.
The “red-before” scenario which uses only the information about the current state of the stop light does quite well compared to the “no-decision” scenario. When the green light is longer than the red light, the extra waiting time vanishes altogether if the stop duration is chosen properly.
Can we do better?
Yes! The more information is available to the driver, the better can the strategy be for making the decision where to stop. We can imagine, for example, that when the bus arrives at a red light, the driver knows when it will turn green again. Or, the driver can have complete information and also know the duration of the following green light.
Let us compute the extra waiting time for the best stopping strategy with complete information. How much better does it do than the “red-before” strategy which uses only the information about the current state of the stop light? The best stopping strategy which uses all available information is the following. Suppose the bus arrives on a red light. The time till the light change is the extra waiting time if the driver decides to stop after the intersection. This time needs to be compared to the extra waiting time which might result if the bus stops before the intersection. This might happen if the total stop duration is longer than the remainder of the red light plus the following green light so that the light is red again after the bus stop is completed. The best decision will depending on when the bus arrives, the duration of the red and green lights and the the bus stop.
I am going to leave you with a comparison of the extra waiting time for the”red-before” strategy with the best stopping strategy with perfect information about the phase of the stop light (length of red, green, time till change).