algorithm

Algorithm Generating String Combinations of Characters

In this post, I shall reveal another algorithm to generate all possible combinations of a given set of characters,of any predetermined range of lengths. For example, if we take a,b,c as possible characters in our combinations, and 1 to 3 as the range of lengths, then we have the following output:

a
b
c
aa
ab
ac
ba
bb
bc
ca
cb
cc
aaa
aab
aac
aba
abb
abc
....
ccc

This algorithm merely generates the combinations as strings of characters. It is upto the user of this algorithm to decide what to do with the generated string. I take no responsibility for what a user might actually do with this algorithm.

It is a non-recursive algorithm and is highly efficient in terms of usage of memory. Being non-recursive, it avoids the pitfall of call stack overflow, even with a large input space.

As with my other algorithm similar to this one here, one needs to implement parallelized code to max out CPUs and achieve a very high number of iterations per second.

The only limitation is that the combinations generated have a large amount of repetition of characters, which is exacerbated when the specified range of lengths is high.

The algorithm is as follows:

1. take length of string from minLen to maxLen, 
   array of characters being considered as dictionary
2. take current length of string as len
3. take radix as length of character dictionary
4. take max as radix to the power of len
5. from num = 0 to num = max - 1, resolve radix len, num
6. resolve radix as follows
   a. take retVal as array
   b. find p = highest power of radix <= num
   c. from j = 0 to len - p - 1, retVal[j] = 0
   d. from k = len - p to len - 1,
      i. t = radix to the power p - 1
      ii. if p > 1, retVal[k] = (num - (num % t)) / t;
          else retVal[k] = (num % radix)
      iii. num -= (retVal[k] * t)
      iv. p--
   e. build string from retVal using 
      str = empty string
      foreach(retVal as r)
          concat (dictionary[r]) with str
   f. do something with str
7. if len = maxLen 
      quit
   else
      increment len by 1 and goto step 2

The cornerstone of this algorithm is the resolve radix subroutine which does the main work in it. It works similar to conversion of decimal numbers to binary numbers. However, the difference is that in each of the bits, instead of just 0 and 1, we can put numbers ranging from 0 to length of the set of possible characters minus one.

In our case, the character set is a,b,c. So the numbers fitting in the bits are 0,1 and 2,with 0 mapping to ‘a’, 1 mapping to ‘b’ and 2 mapping to ‘c’. The total size generated is equal to the current length. So in our example above, there will be three main iterations, with the total bit size of the ‘number’ ranging from 1 to 3.

So the bit combinations range from

0
1
2
00
01
02
10
11
12
20
21
22
000
001
002
….
222

Using our mapping rules above, we get a from 0, b from 1, c from 2, aa from 00, ab from 01, ac from 02, etc.

Let this be a Valentine’s Day gift for everyone!

Leave a Reply

Your email address will not be published. Required fields are marked *