Understanding Cyclic Redundancy Check Implementation
Abstract
The goal of write up is to understand the functionality and
implementation of Cyclic Redundancy Check (CRC) schemes. The first part deals
with division in Gallios Fields (modulo-2 division) which forms the basis of
CRC and its implementation in hardware using 1-bit shift. The later part covers
parallelization of CRC hardware treating n-bits at a time along with table
based implementation.
CRC Concept
CRC is a scheme for error
detection. In this scheme extra information, r is provided along with the
original message, m. The extra information is related to original message m, by
some function f. Parity is one such
function wherein all the bits of message are ex-ored. For illustration and easy
understanding, let’s take an example from arithmetic domain. Suppose we need to
send 2 digits of information, where the information can have value in range 10
to 90. To form the redundant information we select a function of type module g.
We divide message m by g and treat the remainder r as extra information. The
complete transmission scheme then becomes {m,r}. Ex. If
the message is 74 and we select g as 9 then the transmission stream would be
{74, 2} as 74 mod 9 = 2 (f being mod).
This message is transmitted in noisy environment with possibility of a digit
getting changed. The receiver of this message would receive {m’,r’}. It then
repeats the same procedure as transmitter of finding remainder r, dividing by
g. If r’ matches r then m’ matches m, otherwise the received message is
pronounced errored. The error detection capability depends on selection of g.
If g is selected as 2, then the same stream would look like {74, 0}. If 74 gets
converted to 76 with one digit change then receiver cannot detect the error as
76 mod 2 = 0 which is same as 74 mod 2.
The same concept when applied in
digital with binary logic forms the basis for our CRC. Let m be the message to
be transmitted consisting of k bits. We divide this message with another binary
stream g, consisting of (n+1) bits to find out remainder of n bits. The rules
for division are almost the same except that there is no carry here. The
summary for rules
1 ± 1 = 0, 1 ± 0 = 1, 0 ± 1 = 1, 0 ± 0 = 0
This division is called modulo-2
division. The remainder r is always consists of n bits if divisor consists of
n+1 bits. With remainder r, the transmit stream becomes {m, r}.
The division Procedure
Let the message m = 10010101, g= 1011. The division is
performed as shown in Figure 1
Figure 1 Modulo-2 division
Hardware Implementation of division
Having understood the division lets proceed towards hardware
implementation. The stage 0 does not need any explanation. At stage 1, we have
010 as remainder from stage 0 and 0 as brought down as a part of division
procedure. We have 0100 to deal with. The first task is to find out next bit in
quotient. It is very simple as next bit in quotient is always d2.
The subtraction term is calculated by ANDing divisor with next quotient bit. In
arithmetic division it obtained by multiplying divisor with new quotient term.
Here, if qi is the next bit in quotient term, qi = d2 and Sub_m = { ∀
j : gj & qi } Or simply with ternary operator Sub_m =
qi == 1 ? g : 0. In hardware it is going to be implemented with AND
gates or multiplexer as
sub_m[0] =
g[0] & qi
sub_m[1] = g[1] & qi
sub_m[n] =
g[n] & qi
This way we
have found out sub_1 which is 0000 in this case. The next step is to find out
stage2-r2. This is obtained by bitwise XORing stage1-r1 bits with sub_1 bits.
The leftmost bit is always going to be 0 in the result. This bit has done its
job and is thrown out (shifted out) while considering data for next stage,
making one bit position available to pull one bit down from dividend.
d3
= stage1_r1 [3] ^ sub1 [3]
d2
= stage1_r1 [2] ^ sub1 [2]
d1
= stage1_r1 [1] ^ sub1 [1]
d0
= stage1_r1 [0] ^ sub1 [0]
stage2_r2 =
d<<1 i.e. stage_2_r2 [3:1] = d [2:0]
stage2_r2
[0] = next term from dividend
The above sequence of operation needs to be repeated for remaining bits
in dividend. It can also be observed that d3 is not used anywhere,
so no need to implement it in hardware.
With above deductions the hardware for mod2 division for 4 bit divisor
g looks as shown in Figure 2
Figure 2 Modulo-2 division using 1 bit shift
The hardware can be reduced for
known value of g. The AND gates can converted to simple wire or open circuit
based on value g[i] is 1 or 0 respectively.
The same logic can be extended for any n bit value of g.
Parallelizing division
The division discussed above
inputs one bit every clock cycle. This may not be sufficient while handling
data streams at high data rate where data arrives as in bigger chunks e.g. 8 bits/clock, 32 bits /clock. In these cases, we should be able to handle
multiple bits as input to division.
The Figure 3
helps in understanding the division treating a nibble at a time. The difference
with the previous 1 bit approach is that we feed 4 bits into next stage. We
need to find out 4 bit quotient in one clock cycle. The approach is to perform
stage 0 to stage 3 in combinatorial way.
Figure 3 Nibble wide Module 2 division
Consider the stage marked as ds_0, at this
stage we have available with us 8 bits vector ds_0[7:0] and 4 bit divisor
g[3:0] . In ds_0, bit 7 is always 0, next 3 bits ds_0[6:4] are the remainder
from previous stage and remaining 4 bits ds_0[3:0] are the new bits from
message stream. The problem is defined as finding q[3:0] and remainder r from
ds_0 and g . Finally we need to express r in terms of ds_0 and g.
q3 = ds_0[6] As
discussed in 1 bit division
Finding q3 is sufficient to define ds_1 vector.
ds_1[5] = ds_0[5] ^ g[2] & q3
ds_1[4] = ds_0[4] ^ g[1] & q3
ds_1[3] = ds_0[3] ^ g[0] & q3
Since ds_1[6] is not used for
further calculations we don’t waste energy in finding it. ds_1[2:0] = ds_0[2:0].
For next stage similar process needs to be followed.
q2 = ds_1[5] This gives way to find out ds_2
ds_2[4] = ds_1[4] ^ g[2] & q2
ds_2[3] = ds_1[3] ^ g[1] & q2
ds_2[2] = ds_1[2] ^ g[0] & q2
ds_2[1:0] = ds_1[1:0]
On the same line, q1 = ds_2[4]
ds_3[3] = ds_2[3] ^ g[2] & q1
ds_3[2] = ds_2[2] ^ g[1] & q1
ds_3[1] = ds_2[1] ^ g[0] & q1
Finally q0 = ds_3[3]
r[2] = ds_3[2] ^ g[2] & q0
r[1] = ds_3[1] ^ g[1] & q0
r[0] = ds_3[0] ^ g[0] & q0
The above combinatorial logic (div_4 combi) can be used to calculate
remainder of long stream of message as shown in Figure 4
Figure 4 Division of long stream
Same method can be extended to
perform division treating 8, 16, 32, n bits at a time. As the value of n increases,
the combinatorial logic increases thereby reducing the maximum frequency of
operation for the circuit.
Cyclic Redundancy check
CRC checksum is equal to
remainder obtained by dividing message stream m by generator polynomial g. The message stream and generator terms are
expressed as polynomials. g = 1001 is expressed as x3+1. There are
many variants of expressing g based on assumptions that highest and lowest degree
terms always have coefficient as 1 [1] .
We have seen the division of long
message stream by g. CRC implementation uses the same division technique with a
difference that input message stream m , is padded with zeros equal to degree
of generator polynomial. This makes the implementation of check at receiver
simple. When any message with checksum is received, the receiver calculates
remainder on complete stream (including checksum) for correctly received
message the remainder has to be 0.
Apart from above procedure, the
communication standards may define the different procedures for calculation of
checksum for corresponding communication. The standards also define generator
polynomial. IEEE in standard 802.3 [2] defines
generator polynomial for Ethernet communication as
G(x)
= x32 + x26 + x23 + x22 + x16 + x12 + x11 + x10 + x8 + x7 + x5 + x4 + x2 + x + 1
Apart from padding 32 zeros, the
standard requires complementing of first 32 bits of frame and complementing of
remainder before sending out. The former requirement is achieved by preloading
CRC calculation engine with all 1s.
Table based implementation
The approach is similar to the arithmetic
division we do using tables. Instead of
having tables for all numbers, we need to have table only for G, as it is fixed
for one CRC implementation. The depth of table depends on how many bits we want
to handle in one cycle. For the previous nibble wide division, we need to
handle 7 bits in one cycle (4 new bits from stream + 3 bits of remainder from
previous cycle). This corresponds to 2k+1 bits where k is degree of G. Hence we
need a table of depth 27 = 128. Each location index by 7 bit number
ds, contains 3 bits of remainder that could be obtained by dividing ds by G. If
ds = G x q + r, x -> mod2 multiplication, + -> mod2 addition, there the
data structure that we would probably want to maintain is {q, r} for all
possible values of ds. We can omit q as it does not play any role apart from
finding remainder in the corresponding stage. The pseudo-code for this
implementation would look as
ds_r[] ; // Array of remainder
obtained by dividing ds_r by G
const int k; // Degree of generator polynomial G
r = 0 is the remainder
while(m!= EMPTY) {
ds[2k:k+1] = r ; ds[k:0] =
m [0:k] ; m = m << k +1 ;
r = ds_r[ds];
}
The table based approach is comparatively simpler to implement. If the
table ds_r, is implemented in memory, it can be used for different generator
polynomials by reprogramming. On negative side, the hardware requirement
increases exponentially with increase in size of generator polynomial. The hardware
requirement can be brought down by treating less number of bits independent of degree
of generator polynomial at the cost of performance.
Summary
Cyclic redundancy check is remainder of division by generator
polynomial. Out of the two implementations discussed here table based
implementation seems more suitable for implementation in software. The former hardware
implementation is more suitable for high bit rate requirements.
References
- Wikipedia, Cyclic Redundancy Check, http://en.wikipedia.org/wiki/Cyclic_redundancy_check
- IEEE Std 802.3-2008, Carrier Sense Multiple Access with Collision Detection (CSMA/CD) Access Method and Physical Layer Specifications