3 Commits

Author SHA1 Message Date
loeding
90d8c6cd13 minor edit, and adding to the end 2025-07-23 21:03:37 +00:00
loeding
38d0a68371 going deeper on shuffles 2025-07-23 20:33:57 +00:00
loeding
ad2e53195c started analysis of shuffles 2025-07-21 22:51:07 +00:00
60 changed files with 519 additions and 4238 deletions

2
.gitignore vendored
View File

@@ -2,5 +2,3 @@
.DS_Store
output
__pycache__
.ipynb_checkpoints

View File

View File

@@ -1,20 +1,19 @@
# nKode analysis
# nKode analysis
Luke Oeding (Auburn University)
## What is an nKode?
An nKode consists of a passcode $P$ selected utilizing the following setup:
- a $k$ digit keypad, where each key has:
- $p$ properties (position, central number, color, letter, emoji,...)
- $m$ options for each property ($\#$ positions, $\#$ central numbers, $\#$ colors, $\#$ letters, $\#$ emoji, ... )
* a $k$ digit keypad, where each key has:
* $p$ properties (position, central number, color, letter, emoji,...)
* $m$ options for each property (\# positions, \# central numbers, \# colors, \# letters, \# emoji, ... )
- In principle the number of options for each property doesn't have to be the same, but for the sake of our analysis, we make this uniform choice.
- Typically one wants to have all letters displayed exactly once on the keypad for any instance of a keypad, so we take $m = k$.
- $\ell$ = length of the passcode, which is a sequence of $\ell$ letters (options) selected from any of the options.
- a shuffling rule (the split-shuffle)
* $\ell$ = length of the passcode, which is a sequence of $\ell$ letters (options) selected from any of the options.
* a shuffling rule (the split-shuffle)
## Split shuffling
@@ -47,14 +46,14 @@ $$ E(m,p,\ell) = \log_2[(mp)^\ell] = \ell \frac{\log(mp)}{\log(2)}$$
So, for example, when $m=10$ and $p = 7$, the alphabet has 70 letters, and a passcode of length $\ell = 4$ would have entropy:
$$E(10,7,4) = 4\log(70)/\log(2) = 24.5\; \text{bits}$$
$$ E(10,7,4) = 4\log(70)/\log(2) = 24.5\; \text{bits}$$
$$E(10,7,6) = 6\log(70)/\log(2) = 36.8\; \text{bits}$$
$$ E(10,7,6) = 6\log(70)/\log(2) = 36.8\; \text{bits}$$
Relevant is the entropy per symbol, which is $E(m,p,1) = \log_2(m) +\log_2(p) = \frac{\log(m) + \log(p)}{\log(2)}$.
Relevant is the entropy per symbol, which is $ E(m,p,1) = \log_2(m) +\log_2(p) = \frac{\log(m) + \log(p)}{\log(2)}$.
For instance:
$E(10,7,1) = \log(70)/\log(2) = 6.13\; \text{bits}$
$ E(10,7,1) = \log(70)/\log(2) = 6.13\; \text{bits}$
### nKode Entropy
@@ -63,16 +62,16 @@ If one is interested in the likelihood of a single attempt randomly entered nKod
The total number of available passcodes with
$\ell$ letters selected from an alphabet with $k$ keys, with replacement is
$$k^\ell$$
$$ k^\ell $$
So the entropy of the one time keypad is
$$E(k,\ell) = \log_2[(k)^\ell] = \ell \frac{\log(k)}{\log(2)}$$
$$ E(k,\ell) = \log_2[(k)^\ell] = \ell \frac{\log(k)}{\log(2)}$$
So, for example, when $k=10$ a passcode of length $\ell = 4$ would have entropy:
$E(10,4) = 4\log(10)/\log(2) = 13.28\; \text{bits}$
$ E(10,4) = 4\log(10)/\log(2) = 13.28\; \text{bits}$
The entropy per symbol, which is $E(k,1) = \log_2(k) = \frac{\log(k)}{\log(2)}$.
The entropy per symbol, which is $ E(k,1) = \log_2(k) = \frac{\log(k)}{\log(2)}$.
In the case $k=10$ we have $E(10,1) = 3.22 \; \text{bits}$.
@@ -84,6 +83,7 @@ We are interested in understanding the number of times an eavesdropping intruder
[Brooks Brown says that the lower bound is $\min\{3, s \log_2+1\}$, where $s$ is the number of attribute sets, for an nKode of complexity $c=1$, regardless of nKode length $l$, or the number of tiles $t$.]
Regarding the eavesdropper attack we should also consider the case of a keystroke recorder that doesn't observe the labels on the keys of the nKode keypad.
Blind single attacks and repeated blind attempts that successfully log in, and learning the nKode.
@@ -95,11 +95,16 @@ Blind single attacks and repeated blind attempts that successfully log in, and l
The probability that a (blind) randomly entered key-string will yield a successful login:
$R = \frac{\#(\text{passcodes that would yield a correct login})}{\#(\text{possible passcodes})}.$
$
R = \frac{\text{Num}(\text{passcodes that would yield a correct login})}{\text{Num}(\text{possible passcodes})}.
$
(Num stands for 'number of'.)
This should be simply computed using the number of keys $k$ and the length $l$ of the passcode:
$R = (1/k)^l.$
$
R = (1/k)^l.
$
For example, a 6 digit pin would yield a one-in-a-million chance of blindly hitting the correct passcode.
@@ -109,7 +114,7 @@ For the moment we only consider the case of $p=2$, which could be the case if th
Note that if a user insists on only using the placement value of the keys in an nKode to select their password, then they effectively have reduced the complexity or entropy of their password to that of the normal keypad, and an adversary could use frequency analysis to increase the likelihood of guessing a correct password. However, if the user were to use only the number values on the keypad, they would still only have the entropy of a standard keypad generated password, however, because of the shuffling of the letters, using an nKode provides the user with some protection against frequency analysis attacks in the case of a blind intruder.
For example, it is know that users pick passwords like 1234 or 123456 much more frequently than of any other password. If the intruder is able to observe the user typing one of these passcodes on an nKode keypad, then they would be able to guess the passcode with higher probability than random. However, if the intruder is only able to record keystrokes, but not see the display of the nKode keypad, then the intruder would only guess the correct passcode at the frequency of guessing a permutation of the correct passcode. In the case of $\ell$ consecutive digits, the keystroke intruder would only observe $\ell$ distinct keys being typed. The number of such passcodes is $\binom{k}{\ell}$. So in the case of a $k=10$ digit keypad, the number of $\ell=4$ digit passcodes with distinct entries is $\binom{10}{4} = 210$, and when $\ell = 6$ we also have $\binom{10}{6} = 210$. So, after one observation indicating that the passcode consists of $\ell$ distinct digits, the intruder would have probability $P = 1/\binom{k}{\ell}$ of guessing the passcode.
For example, it is know that users pick passwords like 1234 or 123456 much more frequently than of any other password. If the intruder is able to observe the user typing one of these passcodes on an nKode keypad, then they would be able to guess the passcode with higher probability than random. However, if the intruder is only able to record keystrokes, but not see the display of the nKode keypad, then the intruder would only guess the correct passcode at the frequency of guessing a permutation of the correct passcode. In the case of $\ell $ consecutive digits, the keystroke intruder would only observe $\ell$ distinct keys being typed. The number of such passcodes is $\binom{k}{\ell}$. So in the case of a $k=10$ digit keypad, the number of $\ell=4$ digit passcodes with distinct entries is $\binom{10}{4} = 210$, and when $\ell = 6$ we also have $\binom{10}{6} = 210$. So, after one observation indicating that the passcode consists of $\ell$ distinct digits, the intruder would have probability $P = 1/\binom{k}{\ell}$ of guessing the passcode.
It is known that the expected number of trials until the first success would be $1/P$. In this case it would take the intruder on average $\binom{k}{\ell}$ attempts to successfully log in (without guessing the passcode). Even in the case of the 4 digit PIN where the intruder guesses that the passcode consists of the first $4$ digits, (or consecutive or even just distinct) integers the nKode obtains an increase in security for the user by a factor of approximately 210 since the key recording intruder would guess the password 1234 on the first attempt on a standard keypad, but would need approximately 210 trials to successfully log in.
### Multiple blind attempts for a large userbase (Password spraying)
@@ -138,9 +143,13 @@ For example,
### Guessing the passcode:
The likelihood that a randomly chosen passcode will yield a successful login no matter what shuffle has been applied, i.e. so that the attacker can successfully log in as many times as they want:
$1 / \#(\text{possible passcodes}).$
$1 / \text{Num}(\text{possible passcodes}).$
For example, when $k=m=10, p=7$ for $\ell = 4$ this probability is $(70)^{-4} \sim 4.16*10^{-8}$, or about 4 chances in 100 million, and for $\ell = 6$ this probability is $(70)^{-6} \sim 8.5*10^{-12}$, or about 8.5 chances in 1 trillion.
For example, when $k=m=10, p=7$ for $\ell = 4$ this probability is
$(70)^{-4} \sim 4.16*10^{-8},$
or about 4 chances in 100 million, and for $\ell = 6$ this probability is $(70)^{-6} \sim 8.5*10^{-12}$, or about 8.5 chances in 1 trillion.
## Multiple Blind Attempts
@@ -155,7 +164,6 @@ For example, when $k=10$ for $\ell = 4$ and $s=2$ this probability is $((10)^{-4
For example, when $k=10$ for $\ell = 4$ and $s=3$ this probability is $((10)^{-4})^3 = 10^{-12}$, [same order of magnitude as a passcode of length 6], and for $\ell = 6$ this probability is $((10)^{-6})^3 = 10^{-18}$, or 1 chance in one billion billion.
## Incorrect nKodes that still work
We should also consider the number of nKodes that would yield a successful sequence of $s$ logins. [Add example]
## Longer passcodes might not always be more secure
@@ -185,3 +193,348 @@ If someone is able to observe the user typing the nKode, or record the keystroke
### Eye tracking
Eye tracker on phone: How good would this need to be in order to see what attribute the user is searching for?
# Higher Complexity
## Dispersion
Dispersion is an operation on a keypad that permutes properties in such a way that 2 observations of the nKode are sufficient to learn the passcode. It does this by applying a distinct rotation to each property. The authors note that this is possible when the number of keys is not larger than the number of properties per key, because this ensures that there are enough distinct rotations so that no repetitions occur. There are more general permutations that can also have this property, and it seems that this is already implemented in the Enrollment_Login_Renewal.
## Split shuffle
Split shuffle attempts to avoid the dispersion permutation of the keypad so as to increase the number of times an intruder would have to observe the nKode being entered.
The properties are divided into 2 sets, each set will be shuffled by the same shuffle applied to all properties in that set of properties.
Note: by observing both keypads (before and after a split-shuffle), one can learn both what the split was, and what the two shuffles were.
We're intertested in studying how many observations an intruder must make in order to learn the passcode with the split shuffle in place. There are a few scenarios I can imagine.
* No split (analyzed above).
* The split is determined once, and then the shuffles only happen on one side of the split.
* The split changes every time randomly.
* The split changes every time by a set strategy.
Of course we can consider these questions each time the metaparameters change. Recall,
* a $k$ digit keypad,
* $p$ properties (position, central number, color, letter, emoji,...)
* $m$ options for each property (\# positions, \# central numbers, \# colors, \# letters, \# emoji, ... ). typically $m = k$.
* $\ell$ = length of the passcode, which is a sequence of $\ell$ letters (options) selected from any of the options.
Notice that when $k = 1$ the problem is nearly trivial. It doesn't matter what the properties are, the only thing the user is entering is $\ell$, and that can be observed in 1 try.
When $k = 2$. Here's the case $p = 2$ and a 4 letter passcode.
|key | p0 | p1 |
|:----:|:--:|:--:|
|key 0 | a0 | b0 |
|key 1 | a1 | b1 |
After a shuffle [attribute 1]
|key | p0 | p1 |
|:----:|:--:|:--:|
|key 0 | a0 | b1 |
|key 1 | a1 | b0 |
interaction:
|what | |||||
|--------|--|--|--|--|--|
|Passcode| a0|b1|a1|b0|
|Display 1| 0|1|1|0|
|Display 2| 0|0|1|1|
The possible passcodes after display 1:
0{a0,b0},1{a1,b1},1{a1,b1},0{a0,b0}
The possible passcodes after display 2:
0{a0,b1},0{a0,b1},1{a1,b0},1{a1,b0}
Intersect:
{a0},{b1},{a1},{b0}
Passcode learned in 2.
When $k = 2$. Here's the case $p = 4$ and a 4 letter passcode.
|key | p0 | p1 | p2 | p3 |
|:----:|:--:|:--:|:--:|:--:|
|key 0 | a0 | b0 | c0 | d0 |
|key 1 | a1 | b1 | c1 | d1 |
After a shuffle [attribute 1,2]
|key | p0 | p1 | p2 | p3 |
|:----:|:--:|:--:|:--:|:--:|
|key 0 | a0 | b1 | c1 | d0 |
|key 1 | a1 | b0 | c0 | d1 |
interaction:
|what | |||||
|--------|--|--|--|--|--|
|Passcode| a0|c1|c1|d0|
|Display 1| 0|1|1|0|
|Display 2| 0|0|0|0|
The possible passcodes after display 1:
0{a0,b0,c0,d0},1{a1,b1,c1,d1},1{a1,b1,c1,d1},0{a0,b0,c0,d0}
The possible passcodes after display 2:
0{a0,b1,c1,d0},0{a0,b1,c1,d0},0{a0,b1,c1,d0},0{a0,b1,c1,d0}
Intersect:
{a0,d0},{b1,c1},{b1,c1},{a0,d0}.
Passcode is not learned yet.
After a 2nd shuffle [attribute 1,3]
|key | p0 | p1 | p2 | p3 |
|:----:|:--:|:--:|:--:|:--:|
|key 0 | a0 | b1 | c0 | d1 |
|key 1 | a1 | b0 | c1 | d0 |
|what | |||||
|--------|--|--|--|--|--|
|Passcode| a0|c1|c1|d0|
|Display 1| 0|1|1|0|
|Display 2| 0|0|0|0|
|Display 3| 0|1|1|1|
The possible passcodes after display 3:
0{a0,b1,c0,d1},1{a1,b0,c1,d0},1{a1,b0,c1,d0},1{a1,b0,c1,d0}
Intersect:
{a0},{c1},{c1},{d0}.
Passcode learned in 3.
Here's the case $p = 4$ and an 8 letter passcode.
|key | p0 | p1 | p2 | p3 |
|:----:|:--:|:--:|:--:|:--:|
|key 0 | a0 | b0 | c0 | d0 |
|key 1 | a1 | b1 | c1 | d1 |
Shuffle 1 [attribute 1,2]
|key | p0 | p1 | p2 | p3 |
|:----:|:--:|:--:|:--:|:--:|
|key 0 | a0 | b1 | c1 | d0 |
|key 1 | a1 | b0 | c0 | d1 |
Shuffle 2 [attribute 1,3]
|key | p0 | p1 | p2 | p3 |
|:----:|:--:|:--:|:--:|:--:|
|key 0 | a0 | b1 | c0 | d1 |
|key 1 | a1 | b0 | c1 | d0 |
interaction:
|what | ||||||||||
|--------|--|--|--|--|--|--|--|--|--|--|
|Passcode| a0|c1|c1|d0|b1|b0|a1|d0|
|Display 1| 0| 1| 1| 0| 1| 0| 1| 0|
|Display 2| 0| 0| 0| 0| 0| 1| 1| 0|
|Display 3| 0| 1| 1| 1| 0| 1| 1| 1|
The possible passcodes after display 1:
0{a0,b0,c0,d0},1{a1,b1,c1,d1},1{a1,b1,c1,d1},0{a0,b0,c0,d0},
1{a1,b1,c1,d1},0{a0,b0,c0,d0},1{a1,b1,c1,d1},0{a0,b0,c0,d0}
The possible passcodes after display 2:
0{a0,b1,c1,d0},0{a0,b1,c1,d0},0{a0,b1,c1,d0},0{a0,b1,c1,d0},
0{a0,b1,c1,d0},1{a1,b0,c0,d1},1{a1,b0,c0,d1},0{a0,b1,c1,d0}
Intersect:
{a0,d0},{b1,c1},{b1,c1},{a0,d0},{b1,c1},{b0,c0},{a1,d1},{a0,d0}
Passcode is not learned yet.
The possible passcodes after display 3:
0{a0,b1,c0,d1},1{a1,b0,c1,d0},1{a1,b0,c1,d0},1{a1,b0,c1,d0},
0{a0,b1,c0,d1},1{a1,b0,c1,d0},1{a1,b0,c1,d0},1{a1,b0,c1,d0}
Intersect:
{a0},{c1},{c1},{d0},{b1},{b0},{a1},{d0}
Passcode learned in 3.
Here's the case $p = 8$ and an 4 letter passcode.
|key | p0 | p1 | p2 | p3 | p4 | p5 | p6 | p7 |
|:----:|:--:|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
|key 0 | a0 | b0 | c0 | d0 | e0 | f0 | g0 | h0 |
|key 1 | a1 | b1 | c1 | d1 | e1 | f1 | g1 | h1 |
Shuffle 1 [attribute 0,2,4,6] [a,c,e,g]
|key | p0 | p1 | p2 | p3 | p4 | p5 | p6 | p7 |
|:----:|:--:|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
|key 0 | a1 | b0 | c1 | d0 | e1 | f0 | g1 | h0 |
|key 1 | a0 | b1 | c0 | d1 | e0 | f1 | g0 | h1 |
Shuffle 2 [attribute 2,3,4,5] [c,d,e,f]
|key | p0 | p1 | p2 | p3 | p4 | p5 | p6 | p7 |
|:----:|:--:|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
|key 0 | a0 | b0 | c1 | d1 | e1 | f1 | g0 | h0 |
|key 1 | a1 | b1 | c0 | d0 | e0 | f0 | g1 | h1 |
Shuffle 3 [attribute 0,4,5,6] [a,e,f,g]
|key | p0 | p1 | p2 | p3 | p4 | p5 | p6 | p7 |
|:----:|:--:|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
|key 0 | a1 | b0 | c0 | d0 | e1 | f1 | g1 | h0 |
|key 1 | a0 | b1 | c1 | d1 | e0 | f0 | g0 | h1 |
interaction:
|what | ||||||||||
|--------|--|--|--|--|--|--|--|--|--|--|
|Passcode| a1|c1|b0|h0|f0|e1|a1|d0|
|Display 1| 1| 1| 0| 0| 0| 1| 1| 0|
|Display 2| 0| 0| 0| 0| 0| 0| 0| 0|
|Display 3| 1| 0| 0| 0| 1| 0| 1| 1|
|Display 4| 0| 1| 0| 0| 1| 0| 0| 0|
The possible passcodes from display 1:
`1{a1,b1,c1,d1,e1,f1,g1,h1},`
`1{a1,b1,c1,d1,e1,f1,g1,h1},`
`0{a0,b0,c0,d0,e0,f0,g0,h0},`
`0{a0,b0,c0,d0,e0,f0,g0,h0},`
`0{a0,b0,c0,d0,e0,f0,g0,h0},`
`1{a1,b1,c1,d1,e1,f1,g1,h1},`
`1{a1,b1,c1,d1,e1,f1,g1,h1},`
`0{a0,b0,c0,d0,e0,f0,g0,h0}`
The possible passcodes from display 2:
`0{a1,b0,c1,d0,e1,f0,g1,h0},`
`0{a1,b0,c1,d0,e1,f0,g1,h0},`
`0{a1,b0,c1,d0,e1,f0,g1,h0},`
`0{a1,b0,c1,d0,e1,f0,g1,h0},`
`0{a1,b0,c1,d0,e1,f0,g1,h0},`
`0{a1,b0,c1,d0,e1,f0,g1,h0},`
`0{a1,b0,c1,d0,e1,f0,g1,h0},`
`0{a1,b0,c1,d0,e1,f0,g1,h0}`
Intersect:
`1{a1,c1,e1,g1},`
`1{a1,c1,e1,g1},`
`0{b0,d0,f0,h0},`
`0{b0,d0,f0,h0},`
`0{b0,d0,f0,h0},`
`1{a1,c1,e1,g1},`
`1{a1,c1,e1,g1},`
`0{b0,d0,f0,h0}`
Passcode is not learned yet.
Shuffle 2 [attribute 2,3,4,5] [c,d,e,f]
|key | p0 | p1 | p2 | p3 | p4 | p5 | p6 | p7 |
|:----:|:--:|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
|key 0 | a0 | b0 | c1 | d1 | e1 | f1 | g0 | h0 |
|key 1 | a1 | b1 | c0 | d0 | e0 | f0 | g1 | h1 |
So before observing the input for the 3rd attempt:
Hits for each key from prior information:
[1001][1001]
(1/2 are 1's are 1/2 are 0's for every key.)
Now observe the sequence 10001011
The possible passcodes from display 3:
`1{a1,b1,c0,d0,e0,f0,g1,h1},`
`0{a0,b0,c1,d1,e1,f1,g0,h0},`
`0{a0,b0,c1,d1,e1,f1,g0,h0},`
`0{a0,b0,c1,d1,e1,f1,g0,h0},`
`1{a1,b1,c0,d0,e0,f0,g1,h1},`
`0{a0,b0,c1,d1,e1,f1,g0,h0},`
`1{a1,b1,c0,d0,e0,f0,g1,h1},`
`1{a1,b1,c0,d0,e0,f0,g1,h1}`
Intersect with prior information:
`1{a1,g1},`
`0{c1,e1},`
`0{b0,h0},`
`0{b0,h0},`
`1{d0,f0},`
`0{c1,e1},`
`1{a1,g1},`
`1{d0,f0}`
Passcode not learned yet, but after one more shuffle, we think we would learn the passcode.
Shuffle 3 [attribute 0,4,5,6] [a,e,f,g]
|key | p0 | p1 | p2 | p3 | p4 | p5 | p6 | p7 |
|:----:|:--:|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
|key 0 | a1 | b0 | c0 | d0 | e1 | f1 | g1 | h0 |
|key 1 | a0 | b1 | c1 | d1 | e0 | f0 | g0 | h1 |
So before observing the input for the 4th attempt:
Hits for each key from prior information:
[0][10][0][0][0][10][0][01]
(So several of the correct keys are identified, only letters in positions 1,5,7 are unknown)
The possible passcodes from display 4:
`0{a1,b0,c0,d0,e1,f1,g1,h0},`
`1{a0,b1,c1,d1,e0,f0,g0,h1},`
`0{a1,b0,c0,d0,e1,f1,g1,h0},`
`0{a1,b0,c0,d0,e1,f1,g1,h0},`
`1{a0,b1,c1,d1,e0,f0,g0,h1},`
`0{a1,b0,c0,d0,e1,f1,g1,h0},`
`0{a1,b0,c0,d0,e1,f1,g1,h0},`
`0{a1,b0,c0,d0,e1,f1,g1,h0},`
Intersect:
`{a1,g1},`
`{c1},`
`{b0,h0},`
`{b0,h0},`
`{f0},`
`{e1},`
`{a1,g1},`
`{d0}`
Passcode only paritally learned after 3 shuffles
- uncertain about positions 0,2,3,6 = {a1 or g1}, {b0 or h0}
- certain about positions 1,4,5,7 = c1,f0,e1,d0
Shuffles:
[],
[attribute 0,2,4,6] [a c e g ]
[attribute 2,3,4,5] [ cdef ]
[attribute 0,4,5,6] [a efg ]
1,7 [b,h] - not shuffled 4 times, so the key pressed was always the same for the 3rd and 4th character.
3 -- shuffled 1 time, not shuffled 3 times
4 -- shuffled 3 times, not shuffled 1 time
2,5,6 -- shuffled 2 times, not shuffled 2 times
Note that this passcode didn't use the letter g,
This experiment makes us think of two learning tasks.
1) The ability to learn any possible passcode character from observations of inputs
2) The liklihood of learning a given passcode of a fixed length.
3) Could you get the correct entry even with partial information?
4) Probabilistic attack for guessing the correct sequence?
First split shuffle evenly splits things, so you don't gain much. Subsequent shuffles will start to bias toward the correct passcode. The trade-off is that if you don't shuffle a position then the intruder can just use the same input as before and can enter the correct key without knowing the correct icon, but if you do shuffle, then the intruder learns more information.
One key Im understanding better now: Theres a trade-off between (1) information being learned by the intruder and (2) the chances that an intruder could key in a correct key-sequence for each subsequent observation / login attempt.
If an attribute is not shuffled, then that increases the chance for (2), and if an attribute is shuffled that increases the chance for (1).
I think that when the selection of the sets is randomized like youre doing, youre getting the optimal trade-off between these two things. However, theres also probably a strategy that might be even more optimal than just randomly choosing a split each time. If each shuffle chooses for its set of half of the attributes: half from the attributes that were previously shuffled and half from the attributes that werent shuffled in the previous shuffle. You could look back essentially log_2(p) steps (where p is the number of attributes) and similarly subdivide. This seems to be a way to balance the tradeoff for consecutive observations.

File diff suppressed because it is too large Load Diff

Binary file not shown.

Before

Width:  |  Height:  |  Size: 28 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 28 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 26 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 28 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 28 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 27 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 27 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 27 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 26 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 26 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 27 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 28 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 28 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 28 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 27 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 27 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 27 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 28 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 28 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 28 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 29 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 29 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 27 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 27 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 27 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 27 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 26 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 28 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 28 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 28 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 28 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 27 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 27 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 28 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 28 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 29 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 27 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 30 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 28 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 29 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 28 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 27 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 26 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 28 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 28 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 26 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 26 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 27 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 28 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 28 KiB

File diff suppressed because one or more lines are too long

View File

@@ -1,17 +1,82 @@
from src.evilkode import Evilkode
from src.evilkode import Observation, Evilkode
from src.keypad import Keypad
import random
from dataclasses import dataclass
from statistics import mean, variance
from enum import Enum
from pathlib import Path
from src.utils import ShuffleTypes, observations, passcode_generator
@dataclass
class Benchmark:
mean: int
variance: int
runs: list[int]
class ShuffleTypes(Enum):
FULL_SHUFFLE = "FULL_SHUFFLE"
SPLIT_SHUFFLE = "SPLIT_SHUFFLE"
def observations(number_of_keys, properties_per_key, passcode_len, complexity: int, disparity: int, shuffle_type: ShuffleTypes):
k = number_of_keys
p = properties_per_key
n = passcode_len
passcode = passcode_generator(k, p, n, complexity, disparity)
keypad = Keypad.new_keypad(k, p)
def obs_gen():
for _ in range(100): # finite number of yields
yield Observation(
keypad=keypad.keypad.copy(),
key_selection=keypad.key_entry(target_passcode=passcode)
)
match shuffle_type:
case ShuffleTypes.FULL_SHUFFLE:
keypad.full_shuffle()
case ShuffleTypes.SPLIT_SHUFFLE:
keypad.split_shuffle()
case _:
raise Exception(f"no shuffle type {shuffle_type}")
return obs_gen()
def passcode_generator(k: int, p: int, n: int, c: int, d: int) -> list[int]:
assert n >= c
assert p*k >= c
assert n >= d
assert p >= d
passcode_prop = []
passcode_set = []
valid_choices = {i for i in range(k*p)}
repeat_set = n-d
repeat_prop = n-c
prop_added = set()
set_added = set()
for _ in range(n):
prop = random.choice(list(valid_choices))
prop_set = prop//p
passcode_prop.append(prop)
passcode_set.append(prop_set)
if prop in prop_added:
repeat_prop -= 1
if prop_set in set_added:
repeat_set -= 1
prop_added.add(prop)
set_added.add(prop_set)
if repeat_prop <= 0:
valid_choices -= prop_added
if repeat_set <= 0:
for el in valid_choices.copy():
if el // p in set_added:
valid_choices.remove(el)
return passcode_prop
def shuffle_benchmark(
number_of_keys: int,
@@ -29,6 +94,7 @@ def shuffle_benchmark(
full_path = Path(file_path) / file_name
if not overwrite and full_path.exists():
print(f"file exists {file_path}")
with open(full_path, "r") as fp:
runs = fp.readline()
runs = runs.split(',')
@@ -40,14 +106,13 @@ def shuffle_benchmark(
)
runs = []
for _ in range(run_count):
passcode = passcode_generator(number_of_keys, properties_per_key, passcode_len, complexity, disparity)
evilkode = Evilkode(
observations=observations(
target_passcode=passcode,
number_of_keys=number_of_keys,
properties_per_key=properties_per_key,
min_complexity=complexity,
min_disparity=disparity,
passcode_len=passcode_len,
complexity=complexity,
disparity=disparity,
shuffle_type=shuffle_type,
),
number_of_keys=number_of_keys,
@@ -80,14 +145,13 @@ def full_shuffle_benchmark(
) -> Benchmark:
runs = []
for _ in range(run_count):
passcode = passcode_generator(number_of_keys, properties_per_key, passcode_len, complexity, disparity)
evilkode = Evilkode(
observations=observations(
target_passcode=passcode,
number_of_keys=number_of_keys,
properties_per_key=properties_per_key,
min_complexity=complexity,
min_disparity=disparity,
passcode_len=passcode_len,
complexity=complexity,
disparity=disparity,
shuffle_type=ShuffleTypes.FULL_SHUFFLE,
),
number_of_keys=number_of_keys,

View File

@@ -44,4 +44,5 @@ class Evilkode:
return EvilOutput(possible_nkodes=[list(el) for el in self.possible_nkode], iterations=idx+1)
for jdx, props in enumerate(obs.property_list):
self.possible_nkode[jdx] = props.intersection(self.possible_nkode[jdx])
raise Exception("error in Evilkode, observations stopped yielding")

View File

@@ -1,79 +1,7 @@
import random
from enum import Enum
from math import factorial, comb
from src.evilkode import Observation
from src.keypad import Keypad
def total_valid_nkode_states(k: int, p: int) -> int:
return factorial(k) ** (p-1)
def total_shuffle_states(k: int, p: int) -> int:
return comb((p-1), (p-1) // 2) * factorial(k)
class ShuffleTypes(Enum):
FULL_SHUFFLE = "FULL_SHUFFLE"
SPLIT_SHUFFLE = "SPLIT_SHUFFLE"
def observations(target_passcode: list[int], number_of_keys:int, properties_per_key: int, min_complexity: int, min_disparity: int, shuffle_type: ShuffleTypes, number_of_observations: int = 100):
k = number_of_keys
p = properties_per_key
keypad = Keypad.new_keypad(k, p)
def obs_gen():
for _ in range(number_of_observations):
yield Observation(
keypad=keypad.keypad.copy(),
key_selection=keypad.key_entry(target_passcode=target_passcode)
)
match shuffle_type:
case ShuffleTypes.FULL_SHUFFLE:
keypad.full_shuffle()
case ShuffleTypes.SPLIT_SHUFFLE:
keypad.split_shuffle()
case _:
raise Exception(f"no shuffle type {shuffle_type}")
return obs_gen()
def passcode_generator(k: int, p: int, n: int, c: int, d: int) -> list[int]:
assert n >= c
assert p*k >= c
assert n >= d
assert p >= d
passcode_prop = []
passcode_set = []
valid_choices = {i for i in range(k*p)}
repeat_set = n-d
repeat_prop = n-c
prop_added = set()
set_added = set()
for _ in range(n):
prop = random.choice(list(valid_choices))
prop_set = prop//p
passcode_prop.append(prop)
passcode_set.append(prop_set)
if prop in prop_added:
repeat_prop -= 1
if prop_set in set_added:
repeat_set -= 1
prop_added.add(prop)
set_added.add(prop_set)
if repeat_prop <= 0:
valid_choices -= prop_added
if repeat_set <= 0:
for el in valid_choices.copy():
if el // p in set_added:
valid_choices.remove(el)
return passcode_prop

View File

@@ -1,243 +0,0 @@
import json
from dataclasses import dataclass, asdict
from evilkode import Observation
from utils import observations, passcode_generator, ShuffleTypes
from pathlib import Path
from PIL import Image, ImageDraw, ImageFont
from typing import Iterable
# Project root = parent of *this* file's directory
PROJECT_ROOT = Path(__file__).resolve().parent.parent
OUTPUT_DIR = PROJECT_ROOT / "example" / "obs_json"
PNG_DIR = PROJECT_ROOT / "example" / "obs_png"
@dataclass
class ObservationSequence:
target_passcode: list[int]
observations: list[Observation]
def new_observation_sequence(
number_of_keys: int,
properties_per_key: int,
passcode_len: int,
complexity: int,
disparity: int,
numb_runs: int,
) -> ObservationSequence:
passcode = passcode_generator(number_of_keys, properties_per_key, passcode_len, complexity, disparity)
obs_seq = ObservationSequence(target_passcode=passcode, observations=[])
obs_gen = observations(
target_passcode=passcode,
number_of_keys=number_of_keys,
properties_per_key=properties_per_key,
min_complexity=complexity,
min_disparity=disparity,
shuffle_type=ShuffleTypes.SPLIT_SHUFFLE,
number_of_observations=numb_runs,
)
for obs in obs_gen:
obs.keypad = obs.keypad.tolist()
obs_seq.observations.append(obs)
return obs_seq
def _next_json_filename(base_dir: Path) -> Path:
"""Find the next available observation_X.json file in base_dir."""
counter = 1
while True:
candidate = base_dir / f"observation_{counter}.json"
if not candidate.exists():
return candidate
counter += 1
def save_observation_sequence_to_json(seq: ObservationSequence, filename: Path | None = None) -> None:
"""
Save ObservationSequence to JSON.
- If filename is None, put it under PROJECT_ROOT/output/obs_json/ as observation_{n}.json
- Creates directory if needed
"""
if filename is None:
base_dir = OUTPUT_DIR
base_dir.mkdir(parents=True, exist_ok=True)
filename = _next_json_filename(base_dir)
else:
filename.parent.mkdir(parents=True, exist_ok=True)
with filename.open("w", encoding="utf-8") as f:
json.dump(asdict(seq), f, indent=4)
# ---------- Helpers ----------
def _load_font(preferred: str, size: int) -> ImageFont.FreeTypeFont | ImageFont.ImageFont:
"""Try a preferred TTF, fall back to common monospace, then PIL default."""
candidates = [
preferred,
"DejaVuSansMono.ttf", # common on Linux
"Consolas.ttf", # Windows
"Menlo.ttc", "Menlo.ttf", # macOS
"Courier New.ttf",
]
for c in candidates:
try:
return ImageFont.truetype(c, size)
except Exception:
continue
return ImageFont.load_default()
def _text_size(draw: ImageDraw.ImageDraw, text: str, font: ImageFont.ImageFont) -> tuple[int, int]:
"""Get (w, h) using font bbox for accurate layout."""
left, top, right, bottom = draw.textbbox((0, 0), text, font=font)
return right - left, bottom - top
def _join_nums(nums: Iterable[int]) -> str:
return " ".join(str(n) for n in nums)
def _next_available_path(path: Path) -> Path:
"""If path exists, append _1, _2, ..."""
if not path.exists():
return path
base, suffix = path.stem, path.suffix or ".png"
i = 1
while True:
candidate = path.with_name(f"{base}_{i}{suffix}")
if not candidate.exists():
return candidate
i += 1
# ---------- Core rendering ----------
def render_observation_to_png(
target_passcode: list[int],
obs: Observation,
out_path: Path,
*,
header_font_name: str = "DejaVuSans.ttf",
body_font_name: str = "DejaVuSans.ttf",
header_size: int = 28,
body_size: int = 24,
margin: int = 32,
row_padding_xy: tuple[int, int] = (16, 12), # (x, y) padding inside row box
row_spacing: int = 14,
header_spacing: int = 10,
section_spacing: int = 18,
bg_color: str = "white",
fg_color: str = "black",
row_fill: str = "#f7f7f7",
row_outline: str = "#222222",
):
"""
Render a single observation:
- Top lines:
Target Passcode: {target}
Selected Keys: {selected keys}
- Then a stack of row boxes representing the keypad rows.
"""
out_path.parent.mkdir(parents=True, exist_ok=True)
out_path = _next_available_path(out_path)
# Fonts
header_font = _load_font(header_font_name, header_size)
body_font = _load_font(body_font_name, body_size)
# Prepare strings
header1 = f"Target Passcode: {_join_nums(target_passcode)}"
header2 = f"Selected Keys: {_join_nums(obs.key_selection)}"
row_texts = [_join_nums(row) for row in obs.keypad]
# Measure to compute canvas size
# Provisional image for measurement
temp_img = Image.new("RGB", (1, 1), bg_color)
d = ImageDraw.Draw(temp_img)
h1_w, h1_h = _text_size(d, header1, header_font)
h2_w, h2_h = _text_size(d, header2, header_font)
row_text_sizes = [_text_size(d, t, body_font) for t in row_texts]
row_box_widths = [tw + 2 * row_padding_xy[0] for (tw, th) in row_text_sizes]
row_box_heights = [th + 2 * row_padding_xy[1] for (tw, th) in row_text_sizes]
content_width = max([h1_w, h2_w] + (row_box_widths or [0]))
total_rows_height = sum(row_box_heights) + row_spacing * max(0, len(row_box_heights) - 1)
width = content_width + 2 * margin
height = (
margin
+ h1_h
+ header_spacing
+ h2_h
+ section_spacing
+ total_rows_height
+ margin
)
# Create final image
img = Image.new("RGB", (max(width, 300), max(height, 200)), bg_color)
draw = ImageDraw.Draw(img)
# Draw headers
x = margin
y = margin
draw.text((x, y), header1, font=header_font, fill=fg_color)
y += h1_h + header_spacing
draw.text((x, y), header2, font=header_font, fill=fg_color)
y += h2_h + section_spacing
# Draw row boxes with evenly spaced numbers
max_box_width = max(row_box_widths) if row_box_widths else 0
for row, box_h in zip(obs.keypad, row_box_heights):
box_left = x
box_top = y
box_right = x + max_box_width
box_bottom = y + box_h
# draw row rectangle
draw.rectangle(
[box_left, box_top, box_right, box_bottom],
fill=row_fill,
outline=row_outline,
width=2
)
# evenly spaced numbers
n = len(row)
if n > 0:
available_width = max_box_width - 2 * row_padding_xy[0]
spacing = available_width / (n + 1)
for idx, num in enumerate(row, start=1):
num_text = str(num)
num_w, num_h = _text_size(draw, num_text, body_font)
num_x = box_left + row_padding_xy[0] + spacing * idx - num_w / 2
num_y = box_top + (box_h - num_h) // 2
draw.text((num_x, num_y), num_text, font=body_font, fill=fg_color)
y = box_bottom + row_spacing
img.save(out_path, format="PNG")
def _next_run_dir(base_dir: Path) -> Path:
"""Find the next available run directory under base_dir (run_001, run_002, ...)."""
counter = 1
while True:
run_dir = base_dir / f"run_{counter:03d}"
if not run_dir.exists():
run_dir.mkdir(parents=True)
return run_dir
counter += 1
def render_sequence_to_pngs(seq: ObservationSequence, out_dir: Path | None = None) -> None:
"""
Render each observation to its own PNG inside a fresh run directory.
Default: PROJECT_ROOT/output/obs_png/run_XXX/observation_001.png
"""
base_dir = PNG_DIR if out_dir is None else out_dir
base_dir.mkdir(parents=True, exist_ok=True)
# Create a fresh run dir
run_dir = _next_run_dir(base_dir)
for i, obs in enumerate(seq.observations, start=1):
filename = run_dir / f"observation_{i:03d}.png"
render_observation_to_png(seq.target_passcode, obs, filename)
if __name__ == "__main__":
obs_seq = new_observation_sequence(6, 9,4,0,0,numb_runs=50)
save_observation_sequence_to_json(obs_seq)
render_sequence_to_pngs(obs_seq)

View File

@@ -1,4 +1,4 @@
from src.utils import passcode_generator
from src.benchmark import passcode_generator
import pytest
@pytest.mark.parametrize(