I had a project that required key codes. These are codes sort of like the key codes used by software companies to activate your software. For my situation, the most effective way to pass around these codes is as long integers. Otherwise, math on these values, like to pick a block of 100 of them to send to a client device for later use, is almost impossible. Well, not at all impossible, but really really inefficient and clumsy. I decided to use an encoding system like base 64.
Base 64
The base 10 numbering system is what we all use; decimal. There are ten digits. Base 2 is called binary and every programmer knows about binary; there are 2 digits. Hexadecimal is base 16 and the digits are 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F. This may seem weird to a non-programmer, but simply put, there are 16 digits in base 16 and we happen to use the familiar 10 digits for the first 10 and then some letters after that. We could use any symbols for the digits, but these are the easiest set to learn.
My numbering system would not be base 64. I want people to be able to easily type in the values and not have problems with punctuation upper/lower case letters.
Using the 10 digits and then the entire alphabet gives base 36 for my first encoding scheme. But the requirement for these values also included excluding ‘o’ and ‘i’ so that there is no confusing them with 0 and 1. Huh, what about the 0 and 1 looking like ‘o’ and ‘i’? Take those out and we’re left with base 32. Wow, that would be easy to deal with because I could take every 5 bits of the entire number and convert those bits to a digit. But wait, what about cuss words?
The final requirement was that every third digit must be only a numeric digit 2, 3, 4, 5, 6, 7, 8, or 9, in order to avoid any possible cuss word. I did a search of all known English cuss words and found that I could just remove a bunch of characters from the set of digits, instead of using the variable base system. But the boss didn’t go for it because of our non-English-speaking users.
Base 32-32-8-32-32-8-32-32
And a variable base numbering system is born!
The final algorithm uses an encoding where there are two columns of base 32 digits and then one column of base 8 digits. This is repeated for 8 columns. I opted to write the code in a way that could handle any base in any column, so I don’t just extract some bits and convert to digits; I do divides and mods to get what I need.
I use a pair of translation strings for each of the numbering systems in the encoding. in this case, there are two. It would be possible to use a different numbering system for each column if you really wanted to do that.
I used a counter to keep track of which translation string to use. It’s clumsy but works. As usual, I would do a few things differently if I were going to do more work on this code.
You may also have noticed that this encoding is not really like that of numbers because the lowest order column is on the left and there is padding on the right. it was just easier to do it this way since no one is about to use these values as numbers. There is also a hyphen to help make the value easier to read and write down by hand.
Converting the token to a long integer was a little trickier at first. But at least the hyphen is easy to deal with!
I did not include the definitions for the exceptions because they are just about one line of code each.
Summary
It would be easy to use number of numbering systems. A table could determine which column gets what numbering system. But if the base is always a power of 2, it would be far simpler to extract bits from the long integer and then use those to get the visible characters.
this is also designed to only convert a single 64 bit integer into a short human readable string of characters. Converting a long string of characters would take a bit more work.