User Guide#
(soon™ incomplete guide below)
Multiplied focuses on combinational multiplication, fine-grained control over algorithms, and the choice of making algorithms entirely by hand or using build-in building blocks.
Recommended resources before starting:
Setup#
First, import multiplied and decide on a bitwidth for our algorithm, to keep it simple let’s pick 4-bits:
import multiplied as mp
matrix = mp.Matrix(4) # 4-bit logical AND matrix
Here’s what it looks like:
print(matrix)
____0000
___0000_
__0000__
_0000___
The matrix is a “structure” used to generate the logical AND matrix, aka partial products, for a given set of inputs. It also tells the algorithm where each partial product is located.
Note
To populate an AND matrix without an Algorithm object use:
matrix = mp.build_matrix(operand_a=0, operand_b=0, bits=4)
Next, create an algorithm object.
first_alg = mp.Algorithm(matrix)
The Algorithm object will hold the templates that define a given algorithm. It also holds an internal state to track which template it should use next.
Reduce#
Now let’s make some templates. This involves figuring out where you want to place:
Carry Save Adders – CSA (Half Adders, HA, are automtically placed when using simple templates) [3 to 2]
Adders – [2 to 1]
and in the future:
Greedy Adders – Adder which makes use of carry in (cin) [2 to 1]
Collectively these are arithmetic units. All of which reduce a set of partial products by 1, each with their characteristics in latency, complexity and size. None of this applies to multiplied.
Note
Decoders are an exception and can potentially reduce a set of partial products by more than 1.
Decoders – [n to n-k]
They encode specialised operations based on bit position, count, unary count, etc. Decoders are in the roadmap.
To inform the algorithm on how a given template reduces groups of partial products, an array of characters, or pattern is used:
pattern = [
a, # +-
a, # | run of 3 == CSA [3 to 2]
a, # +-
b, # +- run of 1 == None -- no reduction needed
]
Note
CSAs work on 3 bits at a time, and returns a 2-bit sum of raised bits. Adders work on 2 words at a time, each word being x-bits, and returns a single word. Bit pairs which sum to 0b10 are carried through the calculation.
For this 4-bit algorithm it will take 3 rounds, minimum, of reduction to reach out final output:
# Patterns for each stage of first_alg
1st: 2nd: 3rd: output:
a c e x
a c e
a d _
b _ _
Note
Each arithmetic unit will output to the top of its “run”. The next section covers how to “map” these outputs.
Underscores represent no operations or noops.
Map#
Each step of an algorithm needs a pattern or a template, but it also needs to regoup their partial products before using the next template. This is where maps come in.
First, note that bits can only move vertically as moving horizontally changes it’s value. Therefore, each map value is a signed hexadecimal number. For simple maps, the map value represents an entire row rather than a specific bit.
# Maps for each stage of first_alg
1st: 2nd: 3rd:
00 00 00
00 00 00
00 FF 00
FF 00 00
Note
Outputs of each units are packed to the top of their initial row.
Here’s the breakdown of this example:
1st stage - Move first 2 rows of result down by 1 [-1 = FF = down * 1]
2nd stage - Move middle 2 rows of result down by 1 [-1 = FF = down * 1]
3rd stage - No moves required
Algorithms use a template to produce a result, which is then “mapped” to the next template. Each Adder/CSA/etc. needs to know where it should output in relation to the next template. This means as long as outputs are mapped correctly to inputs, the placements of arithmetic units can be anywhere. The final output (x from the pattern example) can be anywhere within the matrix.
Note
In other words, these are also valid algorithms:
# [ Key ]
# char | r :: arithmetic unit | result
# Another valid algorithm # Another
1st: 2nd: 3rd: output: 1st: 2nd: 3rd: output:
a r a r c r
a r c r a r c r d r x
a c r d r a c d
b r c d x b r
# maps # maps
01 00 00 00 01 00
01 01 00 00 01 00
00 01 01 00 00 00
00 00 00 FF 00 00