Playing with models.

Are Android unlock patterns as secure as numeric PINs?

Posted in Mathematics by Alexander Lobkovsky Meitiv on April 14, 2010

An example of the Android unlock pattern

Unlock pattern may be easier to remember than a string of numbers.

This post is a digression. It’s not about constructing mathematical models of real world phenomena. It’s about sequences of numbers like the famous Fibonacci sequence and the Android pattern unlock screen shown here.

The rationale for a graphical pattern instead of a numeric PIN is clear. Humans are much better at remembering patterns than strings of digits. But is the pattern as secure? Other issues aside, such as traceability of fingerprint smudges that allows an attacker to recover the pattern, are there as many patterns as there are numeric PINs of the same length?

Well that’s where understanding recursion relations is useful. A recursion relation defines a sequence of numbers a_n by a relationship between a_n and all preceding sequence members. For example the Fibonacci sequence is defined via

a_n=a_{n-1}+a_{n-2}, \quad a_1=1, \quad a_2=1.

Depending on the type of recursion relation (whether it is linear, for example), there are a variety of methods for solving them. Let’s use the power of recursion to find the number of possible Android unlock patterns that connect N dots.

To make the problem seemingly more complicated (but really simpler) let’s separately compute the number of patterns that start at the center, side and corner of the 3×3 grid of dots. Let’s call those numbers O_N, S_N, and C_N. The total number of unlock patterns is


Because there are 4 corner and 4 side dots in the 3×3 grid.

Let’s now derive a recursion relation for O_N,\ S_N,\ C_N. The first step is to note that when the pattern consists of only one dot (that would not be a very secure pattern) we trivially get O_1=1,\ S_1=1, and C_1=1.

Now let’s imagine that we managed to compute O_{N-1},\ S_{N-1}, and C_{N-1} for some N. We can then compute C_N noticing that the very first link that is made from the center dot can only go to a side dot (4 possible ways) or to a corner dot (also in 4 possible directions). Therefore


When starting from the side dot, there are 4 ways to get to a corner dot, one way to get to the center dot and 2 ways to get to other side dots, therefore


And finally when starting from a corner dot, there are 4 ways to get to a side dot and one way to get to the center dot, thus


The above three relationships can be summarized as

V_N=A\,V_{N-1}\quad\quad\quad \mathrm{(1)},

using matrix notation by defining a vector V_N=(O_N,S_N,C_N)^{T}, and a matrix


Hang on tight, we are almost there. The simple recursion relation (1) means that we can obtain the numbers of patterns connecting N dots from those for N - 1 dots by just applying the matrix A. Applying this reasoning recursively we arrive at


I won’t bore you with the rest of the calculation which is quite routine. As my eccentric math tutor used to say: “After this point it’s just algebra!”

Here is the result of the calculation:

N Number of patterns connecting N dots Number of numeric PINs of N digits
2 56 100
3 360 1,000
4 2,280 10,000
5 14,544 100,000
6 92,448 1,000,000
7 588,672 10,000,000
8 3,745,152 100,000,000

Clearly, the number of patterns grows slower than the number of numeric PINs. This may not matter, however, if patterns are easier to remember and therefore one can comfortably have a longer pattern.

The only sure way to determine whether it is indeed easier to remember patterns, is through careful experiments with human subjects. Until such data are available, all statements about memorability and security of patterns vs PIN’s are pure speculation…

The restriction that a dot cannot be used more than once in a pattern *dramatically* reduces the number of allowed patterns. Please see comments for the true number of unlock patterns.

It’s true: Cubs choke while Yankees surge.

Posted in Sports, Statistics by Alexander Lobkovsky Meitiv on March 24, 2010

1969 Chicago Cubs Photo with autographs

The infamous chokers: 1969 Chicago Cubs

The fans of Chicago Cubs know this scenario all too well. The Cubs, a great team, have a decent season only to slump and choke at the end. You only have to google the words “Cubs choke” to come up with dozens of websites lamenting the numerous heartbreaks. Cubs fans have developed a kind of fatalistic gloom as a coping strategy.

Then there are the Yankees everyone loves to hate. They seem to elevate their game at the end of the season and transform from a good team to hall of fame greatness. It’s as if all season long they weren’t giving it all they got during the regular season. It seems that they play just well enough to get into the postseason only to turn on the afterburners and blow everyone away.

Are these notions fiction perpetrated by fans or fact based on evidence?

We are in a position to test these hypothesis using a scientifically sound ELO ranking system. Using publicly available match data I compiled ELO ratings for all baseball teams going back to 1874. To refresh your memory, an ELO rating is a number which measures the true strength of a team based on all previous games. It is supposed to track the current strength accurately and in an unbiased manner.

The ratings of the 1977 Yankees and the 1969 Cubs

Variation of the ELO rating of the Cubs and Yankees during the 1969 and 1977 seasons.

The graph of the infamous 1969 Cubs choke and the Yankees 1977 season in which they won the World Series after being 51-44 in July and ranked #3 seems to support the “Cubs choke Yankees surge” hypothesis.

Is this true in general or is it just a lucky or unlucky break?

Well, here is where the data analysis can fully demonstrate its magic. The numbers don’t lie. If whoever does the numbers doesn’t that is.

What I did is to compute the difference between the rating of each team at the end of the season and its rating on September 1st of the same season. I then averaged this late season rating change over the last 47 years (since the 1961 expansion of the leagues from 16 to 20 teams). I then tested the result against the hypothesis that the rating change is purely random. This test weeded out the teams whose late season rating change could have resulted from purely random rating fluctuations. The remaining teams’ late season change is statistically significant and therefore not a fluke.

The result clearly supports the “Cubs choke, Yankees surge hypothesis.”

Bargraph of the average late season rating change.

The average late season rating change for 9 teams whose rating change is statistically significant.

ANA: Angels
CHW: White Sox
OAK: Athletics
STL: Cardinals
ATL: Braves
TEX: Rangers
CHC: Cubs
NYY: Yankees
DET: Tigers

Basketball Time Machine

Posted in Sports, Statistics by Alexander Lobkovsky Meitiv on March 10, 2010
1986 Celtics

1986 Celtics

1997 Bulls

1997 Bulls

What would you do with a time machine? I bet some people would be chomping at the bit to pit two dominant teams from different eras against each other and have a grand old spectacle!
But alas, it is safe to say that a time machine will remain for the foreseeable future in the realm of magic.

Can we get a glimpse at what the outcome of such a magical game might be? Is there a scientifically sound way to rate sports teams in a way that judges their true strength. Most importantly, we need a method that yields ratings whose scale does not change with time so that a team that gets a rating of 2000 thirty years ago is as strong (in some sense) as a team that gets a rating of 2000 today.

We are indeed in luck! Such a system exists. It was proposed in the 1950’s by a Hungarian mathematician Arpad Elo (read about him on Wikipedia) and bears his name. His system is based on sound mathematical theory and ever since then dozens upon dozens of mathematical papers have been proving how reliable and reasonable the system is. Although Elo originally proposed his system to rate chess players, it has been adopted by a number of other sports bodies including FIDE, FIFA, MLB, EGF and others.

At the core of the ELO system is the ranking updating scheme which adjusts the ranking of the two teams (or players) after each match depending on the result. Given the rankings before the game, one can compute the probability of each outcome given that the actual performance has a certain probability distribution. If the stronger team wins its rating increases by a smaller amount than if the weaker team wins. There are many different specific incarnations of the system. While some are more accurate than others, even in its simplest form, the system is quite useful. In fact using publicly available match data we can resolve the question:

If 1997 Chicago Bulls played a best of 7 series against the 1986 Boston Celtics, what are the chances of each team winning?

After downloading the match data (56,467 games over 64 years that involved a total of 53 franchises some of which changed names and cities a number of time) and computing the rating history I came up with the top ten highest rated franchises:

Rank Team Year achieved Rating
1 Chicago Bulls 1997 2233.7
2 Boston Celtics 1986 2184.9
3 Los Angeles Lakers 1988 2163.3
4 Philadelphia 76ers 1983 2149.2
5 Detroit Pistons 1990 2137.4
6 Utah Jazz 1999 2129.9
7 Dallas Mavericks 2007 2126.5
8 San Antonio Spurs 2007 2089.4
9 Milwaukee Bucks 1971 2081.6
10 Seattle Supersonics 1996 2076.5

It is a telling sign that the NBA is a competitively healthy organization that the top 10 all time high ranking teams of all time pretty close to each other in rating. Also, it seems at least superficially, that there is no historical bias meaning the objective meaning of a rating does not change with time.

So, what would happen if the 1997 Bulls played a best of 7 series against the 1986 Celtics?
Home field advantage aside (the ranking I am using does not take that into account), the probability of the Bulls winning any particular game is . The probability of winning a best of 7 series (below )

The Bulls would have a 57.5% chance of winning the series: an exiting spectacle indeed!

Finally I leave you with a graph of the historical ratings of six teams from large metropolitan areas from 1980 to present day. It seems that it is extremely difficult to maintain a dominant team for more than a few seasons (although the Lakers managed to do so in the 1980’s).

NBA ELO ratings graph

Historical season ending ELO ratings for six NBA teams from large metropolitan areas

Unavoidable Attraction

Posted in Statistics, Transportation by Alexander Lobkovsky Meitiv on March 2, 2010


When traffic is heavy, buses tend to form hard to breakup bunches.

Everyone who rides buses in a city is familiar with the dreaded “bus bunching” phenomenon. Especially during rush hour, buses tend to arrive in bunches of two, three or even more. Why is that?

To begin understanding this phenomenon we must first assimilate the notion of fluctuations. The bus’s progress along the route, though ideally on schedule, in practice is not. At each stop there is a difference between the actual and the scheduled arrival time. The nature of the fluctuations is such that this difference tends to grow along the route of the bus. In technical terms, the bus’s trajectory is called a directed random walk. There are several sources of the fluctuations: stop lights, variation in the number of passengers to be picked up and discharged and, of course, traffic.

When fluctuations are strong, and/or, the buses are frequent, it is unavoidable that consecutive buses find themselves at the same bus stop. What happens afterwards is less clear cut. It seems that it is virtually impossible for the buses to separate again. From that point on the two (or more) buses travel in a bunch. The average speed of a bus bunch is frequently greater than that of an isolated bus and therefore bunches tend to overtake and absorb buses that are ahead.

Let’s try to come up with a plausible explanation for the two observed phenomena: Why do bus bunches not break up naturally? Why is the average speed of the bunch different from the average speed of an isolated bus?

Let’s tackle the questions one at a time. When don’t bunches break up? There could be several reasons. Without real field data, I am afraid, we won’t be able to say for certain which factor is the most important.

Possible reason #1: Excluded volume interactions. Analogy with colloids.

Colloids are suspensions of small solid particles in a fluid. It is a well know phenomenon, readily reproducible in a lab, that when you combine colloidal particles of two substantially different sizes, they tend to separate even if the particles themselves are not attracted to each other. It may be counterintuitive, but the system can increase its entropy by separating particles by size. Once a small particle escapes from the aggregate of large particles, it is extremely unlikely to make it back there.

The same size separation might happen in traffic, although likely for different reasons. How much do you like being sandwiched between a bus and a dump truck? You try to get the hell out of there at the first opportunity.

So spaces between traveling buses may be unlikely to be filled up with cars. In a sense, there is an effective attraction between buses cased by the car’s avoidance of the space between them.

One would certainly need data to support or reject the excluded volume hypothesis of bus attraction.

Possible reason #2: Correlation between the number of waiting passengers and the distance to the nearest bus ahead.

Now this idea is something we could sink our teeth into. Suppose that the gap between two buses shrinks due to a random fluctuation of unspecified nature. Then, the mean number of passengers waiting for the second bus, which is proportional to the wait time (if the passengers arrive at the bus stop at a fixed rate), also decreases. Therefore the second bus will spend less time picking up passengers, it’s mean velocity will therefore increase and it will catch up with the bus ahead. We can therefore say that the state with evenly spaced buses along the route is unstable to collapse.

This idea can be formalized in the following simple toy model.

Suppose there is a circular route with equidistant stops (a linear route is really circular if the buses turn around at the end of the route and go back immediately). Initially a number of buses are uniformly distributed along the route. Passengers arrive at all bus stops at a fixed rate. The time a bus spends at a stop is proportional to the number of passengers waiting there.

Passenger discharge can be included in the model. However it does not qualitatively affect the results.

There are two important parameters in this model: 1) the product of the travel time between stops and the rate of passenger arrival. This parameter determines whether the bus spends most of its time traveling or picking up passengers. 2) The ratio of the number of stops to the number of buses.

It turns out that if the first parameter is large (most time is spent traveling) or the second parameter is small (there are lots of buses), bunching does not occur.

However, as illustrated in the figure below, there is a realistic parameter range in which bunching does occur and bunches have no chance to break up. In the figure below (which presents the output of the simple model above), the three buses were initially well spaced. Eventually, buses 1 and 2 form a bunch which catches up to bus 3.

Once the bunch of two buses is formed, the buses leapfrog each other and pick up passengers at alternating stops. Here is therefore the answer to our second question why bunches travel faster: each bus only has to accelerate/decelerate less frequently since they only stop at every other stop. Hence the average speed is greater.

It would be fun to go out there and time some bus arrivals to see if they can be well described by the model. Any takers?

Graph illustrating the bunch formation

Correlation between the space between buses and the number of waiting passengers results in the bunching behavior.

Decisions, decisions, decisions…

Posted in Statistics, Transportation by Alexander Lobkovsky Meitiv on February 25, 2010

This entry is about how the amount of information at the time of a decision can increase the efficacy of the outcome.

The specific case I will talk about is public transport.

Have you ever been on a bus that sat at a red light only to stop again at a bus stop right after passing the intersection?
Did you wonder if it would be better to have the bus stop located before the light?

Wonder no more! If you read on, we will answer this question and a few others using simple statistics and a few carefully chosen assumptions.

Let us first compute the average waiting time at a red light. Let’s say the light has only two states: red and green which alternate. The durations of the red and green lights are fixed and are t_r and t_g. Suppose that the bus arrives at a light at a random time. Then its average waiting time at the red light is

\displaystyle t_\ell=\frac{1}{t_r+t_g}\int_0^{t_r}t\,dt=\frac{t_r^2}{2(t_r+t_g)}.

This is because we assume that the bus arrives at the light at a random time. Without any prior information, the distribution of arrival times is uniform. The behavior of the light is periodic with period t_r+t_g and thus the probability of arriving in any time interval dt is dt/(t_r+t_g).

For example, if the red and the green lights are equally long, i.e. t_g=t_r, the average wait at a stop light will be a quarter of the red light duration t_\ell = t_r/4. (To derive that substitute t_g=t_r into the equation above).

Now lets add the bus stop to the equation. We will assume that the bus stops for a fixed time t_s. Fluctuations in the stopping time can be added to the model. However, calculations become a bit more involved and the result does not change qualitatively.

The questions are: 1) What is the total stoppage time t_w: red light + bus stop? 2) Does it depend on whether the bus stop is before or after the red light?

If we know anything about information theory, our answer to the second question is NO without doing any algebra. Why? Because the bus arrival time is random and uncorrelated with the timing of the stop light. There is no information that can distinguish stopping before and after the intersection. If the stop is after the light, the bus has the wait at the red light for a time t_{\ell} just computed above. If the bus stops before the light, the “arrival” time is the time at the end of the stop and it is just as random as the arrival to the stop. Therefore, the average total stoppage time is just t_\ell + t_s regardless of whether the stop is before or after the light.

How can the total stoppage time be reduced?

After all this post is about efficiency of mass transit. The answer, again from the point of view of information theory, is the following. To improve efficiency, we must use available information to make decisions which make the arrival (or departure) time of the bus correlated with the timing of the light.

In Switzerland, public support for mass transit is so strong, that people accept that the trolleys actually change the timing of the stop lights to speed up passage at the expense of cars. Here in America this approach may not fly. However, even if the timing of the stop light cannot be changed by the bus/trolley driver, they still have the power to make decisions that would change the total stoppage time.

In the example above, the bus stop was always before or after the intersection. Suppose the driver could decide, based on some information about the phase of the stop light, whether to stop before or after the intersection?

Let’s call the scenario in which the bus driver does not make a decision where to stop the “null model” or the “no-decision” model. As a better alternative consider the “red-before” scenario in which the driver stops before the intersection if the bus arrives on the red light and after the intersection if the bus arrives on the green light. What is the average stopping time t_w = t_s + \Delta?

I am not going to bore you with the tedious derivation. The result itself is a bit complicated as we have to consider 4 separate cases. I am going to give a formula for the extra waiting time \Delta on top of the regular stop duration t_s.

Let’s first define:

\displaystyle I_1=\frac{(t_r-t_s)^2}{2(t_r+t_g)},


\displaystyle I_2=\frac{(t_s-t_g)(t_r-\frac{1}{2}(t_s-t_g))}{t_r+t_g}.

Then the extra waiting time is

\Delta=0 for t_r \le t_s \le t_g
\Delta=I_1 for 0\le t_s \le \min(t_r,t_g)
\Delta=I_2 for \max(t_r,t_g)\le t_s \le t_r+t_g
\Delta=I_1 + I_2 for t_g \le t_s \le t_r

If t_s \ge t_r + t_g, just replace t_s everywhere with its remainder when divided by t_r + t_g.

To illustrate these formulas here is the graphs comparing the extra stoppage time \Delta for the “no-decision” and the “red-before” scenarios as a function of the stop duration t_s for two different ratios of the red to green light durations.

Extra waiting time for t_r = 2 t_g

Comparison of the extra waiting time as a function of the bus stop duration for two different decision scenarios and the red light twice as long as the green light.

Extra waiting time for t_g = 2 t_r

Comparison of the extra waiting time for the green light twice as long as the gred light. Note that for a certain range of bust stop durations, the extra waiting time vanishes completely!

The “red-before” scenario which uses only the information about the current state of the stop light does quite well compared to the “no-decision” scenario. When the green light is longer than the red light, the extra waiting time vanishes altogether if the stop duration is chosen properly.

Can we do better?

Yes! The more information is available to the driver, the better can the strategy be for making the decision where to stop. We can imagine, for example, that when the bus arrives at a red light, the driver knows when it will turn green again. Or, the driver can have complete information and also know the duration of the following green light.

Let us compute the extra waiting time for the best stopping strategy with complete information. How much better does it do than the “red-before” strategy which uses only the information about the current state of the stop light? The best stopping strategy which uses all available information is the following. Suppose the bus arrives on a red light. The time till the light change is the extra waiting time if the driver decides to stop after the intersection. This time needs to be compared to the extra waiting time which might result if the bus stops before the intersection. This might happen if the total stop duration is longer than the remainder of the red light plus the following green light so that the light is red again after the bus stop is completed. The best decision will depending on when the bus arrives, the duration of the red and green lights and the the bus stop.

I am going to leave you with a comparison of the extra waiting time for the”red-before” strategy with the best stopping strategy with perfect information about the phase of the stop light (length of red, green, time till change).

The moral of the story: “Information is power!”

Extra waiting time for the

The perfect information helps reduce the extra waiting time when the red light is longer than the green and when the bus stop duration is longer than the that of the green light.

Hello world!

Posted in General by Alexander Lobkovsky Meitiv on February 22, 2010

A Lobkovsky MeitivThis blog is a way to share my love of mathematical models.  If you have ever enjoyed puzzles, algebra, chess and other intellectual pursuits, you may find this blog entertaining and may learn something in the process.