John Faben's Homepage
  • Home
  • CV
  • Maths
    • Research >
      • Mouse Maze
      • My PhD Thesis
    • Teaching >
      • Counterparty Credit Risk
      • Outreach
  • Bridge
  • Blog

PIN Codes

10/9/2016

0 Comments

 
At work the other day I had to change my PIN for a certain application. It was a purely numeric PIN, and the change came with the following restrictions: 
  • PIN must not contain repeated digits (e.g. 11111)
  • PIN must not contain consecutive digits (e.g. 12345)
Now, I think it's pretty clear that the restriction is intended to apply to only those strings which consist entirely of repeated or consecutive digits (ie, strings that look like the examples given), so there are in fact only 10 + 7 = 17 out of 100,000 possible PIN codes being excluded by this rule, and it isn't doing too much harm. 

But what if you took it more literally? By a strict interpretation of the rules, 74583 should not be allowed, because it contains consecutive digits (45), and 38871 would not be allowed, because it contains repeated digits (88). There is some ambiguity over whether 24682 should be allowed (is the 2 in that string repeated?). If we did take the rule literally, how many combinations would we be left with? Obviously it depends on the length of the PIN - the particular PIN in question was 5 digits, but let's have a look at what happens for various different lengths, with the different rules:
Number of digits
No restrictions
No consecutive repeats
No repeats at all
1
10
10 (100%)
10 (100%)
2
100
81 (81%)
81 (81%)
3
1000
656 (66%)
584 (58%)
4
10,000
5313 (53%)
3689 (37%)
5
100,000
43030 (43%)
19998 (20%)
6
1,000,000
348501 (35%)
90445 (9%)
As you would expect, the percentage of passwords that aren't allowed goes up as the number of digits goes up (there are more possible pairs of digits which can be consecutive or be repeats. I created this table with a quick Python script, but is there a way we could have done it by hand?

Well, at least for the less restrictive version of the rule, yes there is (or at least of getting very close). Look at the number of 2 digit numbers which are not allowed - there are 19 of them (you can't have a 2 digit string of consecutive numbers that starts with a 9). Now for longer strings, you can essentially just consider each pair of digits as an independent experiment, which has an 81% chance of being allowable under the rule. So for 3 digit PINs, you'd expect to have 81%*81% = 65.61% of the numbers allowed - which gives exactly the right answer. For larger strings you don't quite get exactly the right answer (because the trials aren't actual independent - knowing that one set of digits is consecutive gives you some information about whether or not the others are), but it gets pretty close. 

How about for the less restrictive version? Well, it's a bit more complicated, but we I think we can still do it. The number of strings of length n which consist entirely of different digits is 10Cn * n! = 10!/(10-n)! = 10 * 9 * 8 ... * (10-n). ie, for n=2, there are 10*9 = 90 combinations, and we already know that 81 of these 90 are permissible. So, a quick calculation for how many PINs of length n should be 0.9^n*10!/(10-n)!. How close is that? We know it's exactly correct for n=2. For n=3, it gives us 0.81*(10*9*8) =  583.2, so it's pretty good. It gets slightly worse as n gets larger, for the same reason as before. 

Is there anything else we can say about these PINs? Well, let's have a look at what they look like (mostly because I wanted to play with TikZ). Here's a map of which PINs of length 4 are allowed under each rule (plotted on a 100x100 grid). This is also generated from the script above. 
Picture
No two consecutive digits the same
Picture
No two digits the same
There's probably supposed to be some less about password security and system design here, but I'm going to stick with pretty pictures and combinatorics instead.
0 Comments

Beating Roger Federer

2/7/2016

1 Comment

 
Picture
The Riddler at Five Thirty Eight is a regular puzzle column that sets a puzzle to be solved over the weekend. The puzzles have ranged from completely trivial to outright dastardly. This week's puzzle is one of the more trivial ones, but it raises an interesting question. 

Here's the puzzle: 
Your wish has been granted, and you get to play tennis against Roger Federer in his prime in the Wimbledon final. You have only a 1 percent chance to win each point, but Roger, sporting gentleman that he is, offers to let you name any score and begin the match at that point. (So, if you’ve entertained a fantasy of storming back after being down three match points in the fifth set, now’s the time to live it.) What score can you name that gives you the best chance to win, and what is your chance of winning the title?

I think it's almost immediately obvious that you should choose the score to be something like 6-0 6-0 6-6 (6-0). ie, you're 6 points up in a tie breaker in order to win your third set. This means you have six match points, each of which is a 1/100 chance of glory. The probability that you miss all 6 is just under 6%. Specifically, they are 1-0.99^6 = 5.85%. You can add an extra 0.01% chance of you winning the match from 6-6 in the tiebreak, for a total probability of winning of almost exactly 5.86% - the chances of winning in any other way than winning 2 out of the first 8 points that you play are significantly less than 0.01%, so can safely be ignored.

Incidentally, this last point is the reason why you choose to be 6-0 6-0 6-6 rather than 6-0 6-0 5-0. In the latter case, the best you can do is 3 match points, for roughly a 3% chance of winning the title, and after Federer pulls the game back to level, the way tennis scoring works means that you will essentially never get another chance. Think of it this way: in order to win the game after the score is level in the game, you need to win at least two points in a row. But the chance of you winning any two points in a row is 0.01%, so all the extra chances you have by starting at 5-0 up are just multiples of 0.01%, rather than multiples of 1%, and even to get the stage where you win 2 points in a row is really, really difficult.

That raises the obvious follow up question. How good would you have to be to choose 6-0 6-0 5-0 rather than the tie breaker? I suspect if I were feeling adventurous I could calculate this by hand, but I'm not, and since I have to write this on a computer anyway, I might as well write some code that simulates tennis matches. You can see the graph on the right. The x axis is your probability of winning each point, and the y axis is your probability of winning the match. 

Picture
It's clear that the line of red circles does eventually cross the line of green triangles, but it's not until somewhere around the 40% mark. So the answer to the question "how good would you have to be to choose to be 2-0,5-0,40-0 rather than 2-0, 6-6, 6-0 up?" is very, very good indeed, good enough that you'd win about 40% of points vs. Roger Federer! Zooming in a bit (graph on the left), it looks like it actually happens at about 42%. ​

Incidentally, while I've got this code running, I might as well generate a plot of how often you win each match given different probabilities of winning points (I'm not going to take into account serving, and I'm allowing the last set to be own on a tiebreaker, since presumably the difference that makes is negligible). I've started the plot at 50, since it's obviously symmetric around this. Based on this, if you have a 45% chance of winning each point, you have just  a 5% chance of winning the match. This is pretty much the same as the chance you'd have of winning the most points at those skill differences: matches last about 250 points and the odds of a binomial distribution with 251 trials and p=0.45 generating more than 126 successes are about 5.5%. However, things rapidly go downhill for the underdog. If you have a 42% chance of winning each point, you win about .3% of the matches, whereas you win the most points about .6% of the time.  Tennis scoring does amplify skill differences, but tennis matches consist of *a lot* of points, so the better player is almost guaranteed to win, however you structure the scoring.

Picture
Finally - Ollie's version doesn't allow this solution, but the last time I heard this problem, Federer was going to play the match out up until the point you were magically transported in to finish it off. With that version, I heard the particularly neat solution that you should pick the score to be (6-0) (7-6) (6-6), where you are leading (6-0) in the tie breaker in the third set, as before, but the final score in the second set tiebreaker was something like 1,000,002-1,000,000, so that Federer is exhausted from playing the two million point tie breaker in the last match. 

You can see the (not particularly elegant) Python code that I used to generate these figure here.

1 Comment
Forward>>

    John Faben

    This is a blog. It doesn't currently contain anything, but one day it might, in which case this note will be incorrect, unless it changes.

    Archives

    August 2020
    July 2020
    February 2019
    August 2018
    July 2018
    June 2017
    March 2017
    September 2016
    July 2016

    Categories

    All

    RSS Feed

Proudly powered by Weebly