Build Your Custom FPGA¶
This tutorial introduces how to build a custom FPGA. The full script can be found at examples/fpga/scanchain/fle6_N2_mem2K_8x8/build.py.
Describe the architecture¶
To start building an FPGA, we first need to create a Context
object. A
Context
object provides a set of API for describing CLB/IOB structure, FPGA
layout and routing resources. It also stores and manages all created/generated
modules and other information about the FPGA, which are later used by the
RTL-to-bitstream flow.
from prga import *
from itertools import product
import sys
# create a new context
ctx = Context()
After creating the Context
object, we can start to describe our custom FPGA.
Here, we first describe the routing resources in the FPGA: the routing wire
segments and the global wires.
# create global clock
gbl_clk = ctx.create_global("clk", is_clock = True)
# assign an IOB to drive the clock
# the first argument is the position of a tile, and the second argument is
# the subtile ID in that tile
gbl_clk.bind( (0, 1), 0)
# create wire segments: name, width, length
ctx.create_segment( 'L2', 20, 2)
Note that width
is the number of the specific type of wire segments in each
routing channel in one direction. In the example above, each horizontal channel
contains 80 tracks:
- 20
L2
tracks that run from west to east, starting from the current tile - 20
L2
tracks that run from west to east, starting from 1 tiles west to the current tile - 20
L2
tracks that run from east to west, starting from the current tile - 20
L2
tracks that run from east to west, starting from 1 tiles east to the current tile
Before describing the CLB/IOBs in our custom FPGA, we can add custom logic
elements (or primitive cells, hardwired IP blocks, etc.) to our Context
and
use them when we describe CLB/IOBs. For example, PRGA provides an API to create
a memory module:
# create a memory primitive: addr width, data width
memory = ctx.create_memory( 8, 8 )
PRGA also provides API for adding and using arbitrary Verilog modules in the FPGA,
for example Context.build_primitive
. Multi-modal primitives are also supported
by calling Context.build_multimode
.
TODO: Add tutorials for adding custom Verilog modules and multi-modal
primitives.
Then, we can describe the CLB/IOB structures in our custom FPGA. Use
Context.build_io_block
or Context.build_logic_block
to create an
IOBlockBuilder
or LogicBlockBuilder
object, then use the builder
object to build the desired block. After describing
the desired block, use the commit
method of the builder to commit
the module into the Context
database.
In this example we’ll be using FLE6
as our basic logic element.
FLE6
contains one fracturable LUT6, one hard adder, and two flipflops.
The fracturable LUT6 may be used as two LUT5s with shared inputs.
# =======================================================================
# -- CLB ----------------------------------------------------------------
# =======================================================================
# create CLB builder
builder = ctx.build_logic_block("clb")
# create a block input that is directly connected to a global wire and not
# routable
clk = builder.create_global(gbl_clk, Orientation.south)
# create other block inputs/outputs
# name, width, on which side of the block is the port
in_ = builder.create_input ("in", 12, Orientation.west)
out = builder.create_output("out", 4, Orientation.east)
cin = builder.create_input ("cin", 1, Orientation.south)
cout = builder.create_output("cout", 1, Orientation.north)
# Instantiate logic primitives
# module to be instantiated, name, number of instances
for i, inst in enumerate(builder.instantiate(ctx.primitives["fle6"], "i_cluster", 2)):
# connect nets: driver (source) nets, drivee (sink) nets
builder.connect( clk, inst.pins['clk'] )
builder.connect( in_[6*i : 6*(i+1)], inst.pins['in'] )
builder.connect( inst.pins['out'], out[2*i : 2*(i+1)] )
# 'vpr_pack_pattern' is a keyword-only argument. See
# "https://docs.verilogtorouting.org/en/latest/arch/reference/#tag-%3Cpack_patternname="
# for more information
builder.connect( cin, inst.pins['cin'], vpr_pack_patterns = ['carrychain'] )
cin = inst.pins["cout"]
builder.connect(cin, cout, vpr_pack_patterns = ["carrychain"])
# Commit the described CLB. The module is now accessible as `ctx.blocks["clb"]`
clb = builder.commit()
# =======================================================================
# -- IOB ----------------------------------------------------------------
# =======================================================================
# create IOB builder
# An instance named "io" is automatically added into the IOB. This is the
# I/O pad for off-chip connections. By default, a bi-directional pad that
# can be configured as input or output is instantiated.
builder = ctx.build_io_block("iob")
# create block inputs/outputs
o = builder.create_input("outpad", 1)
i = builder.create_output("inpad", 1)
# connect
builder.connect(builder.instances['io'].pins['inpad'], i)
builder.connect(o, builder.instances['io'].pins['outpad'])
# Commit the IOB. The module is also accessible as `ctx.blocks["iob"]`
iob = builder.commit()
# =======================================================================
# -- BRAM ---------------------------------------------------------------
# =======================================================================
# Here we specify the width and height of this block (in number of tiles)
builder = ctx.build_logic_block("bram", 1, 2)
# Instantiate the memory module
inst = builder.instantiate(memory, "i_ram")
# create and connect ports/pins
builder.connect( builder.create_global(gbl_clk, Orientation.south),
inst.pins["clk"])
builder.connect( builder.create_input("we", 1, Orientation.west, (0, 0)),
inst.pins["we"])
builder.connect( builder.create_input("waddr", len(inst.pins["waddr"]), Orientation.west, (0, 0)),
inst.pins["waddr"])
builder.connect( builder.create_input("din", len(inst.pins["din"]), Orientation.east, (0, 0)),
inst.pins["din"])
builder.connect( builder.create_input("raddr", len(inst.pins["raddr"]), Orientation.west, (0, 1)),
inst.pins["raddr"])
builder.connect( inst.pins["dout"],
builder.create_output("dout", len(inst.pins["dout"]), Orientation.east, (0, 1)))
# commit the BRAM block. The module is also accessible as `ctx.blocks["bram"]`
bram = builder.commit()
Direct inter-block connections (DirectTunnel
) can be defined using
Context.create_tunnel
. This is often used for carrychains where connections
are hardwired, i.e., not routable, but faster.
# Create a direct inter-block connection
# name of the tunnel, from port, to port, relative position
#
# "relative position" is the position of the destination port relative to
# the source port (not the blocks)
ctx.create_tunnel("carrychain", clb.ports["cout"], clb.ports["cin"], (0, -1))
After describing all the blocks we want, we can describe the tiles for each block. A tile contains one or more block instances and the connection boxes around them.
PRGA supports full customization of the connection/switch boxes. In this
tutorial, we will let PRGA to generate the connections for us. This is done
by calling TileBuilder.fill
and ArrayBuilder.fill
methods.
# Create 4 different IO tiles, one per edge
iotiles = {}
for ori in Orientation:
builder = ctx.build_tile(iob, # block to be instantiated in this tile
4, # number of block instances in this tile
name = "t_io_{}".format(ori.name[0]), # name of the tile
edge = OrientationTuple(False, **{ori.name: True})) # on which edge of the FPGA
# auto-generate connection boxes and fill connection box patterns
# default input FC value, default output FC value
builder.fill( (1., 1.) )
# FC values affect how many tracks each block pin is connected to
#
# In this example we use ratio-based FC values, so "1." means 100%
# connection, "0.4" means 40% connection. The bigger the FC values,
# the more routable the FPGA is. However, bigger FC values also result
# in more hardware resources, and may slow down the FPGA itself.
# automatically connect ports/pins in the tile
builder.auto_connect()
# commit the tile
iotiles[ori] = builder.commit()
# Concatenate build, fill, auto-connect and commit
#
# We use less
clbtile = ctx.build_tile(clb).fill( (0.4, 0.25) ).auto_connect().commit()
bramtile = ctx.build_tile(bram).fill( (0.4, 0.25) ).auto_connect().commit()
After describing all the tiles, we can describe arrays/sub-arrays. An array is a 2D mesh. Each tile in the mesh contains one tile instance and up to four switch boxes, one per corner. Tiles larger than 1x1 will occupy adjacent tiles and switch box slots:
# Select a switch box pattern. Supported values are:
# wilton, universal, subset, cycle_free
pattern = SwitchBoxPattern.wilton
# Create an array builder
# name, width, height
builder = ctx.build_array('subarray', 4, 4, set_as_top = False)
for x, y in product(range(builder.width), range(builder.height)):
if x == 2:
if y % 2 == 0:
builder.instantiate(bramtile, (x, y))
else:
builder.instantiate(clbtile, (x, y))
# Commit the subarray
subarray = builder.fill( pattern ).auto_connect().commit()
# Create the top-level array builder
builder = ctx.build_array('top', 10, 10, set_as_top = True)
for x, y in product(range(builder.width), range(top_height)):
# leave the corners empty
if x in (0, builder.width - 1) and y in (0, builder.height - 1):
pass
# fill edges with IO tiles
elif x == 0:
builder.instantiate(iotiles[Orientation.west], (x, y))
elif x == builder.width - 1:
builder.instantiate(iotiles[Orientation.east], (x, y))
elif y == 0:
builder.instantiate(iotiles[Orientation.south], (x, y))
elif y == builder.height - 1:
builder.instantiate(iotiles[Orientation.north], (x, y))
# subarrays
elif x % 4 == 1 and y % 4 == 1:
builder.instantiate(subarray, (x, y))
# commit the top-level array
top = builder.fill( pattern ).auto_connect().commit()
Generate Yosys and VPR scripts¶
After describing the desired FPGA architecture, we can generate the scripts for our RTL-to-bitstream flow. Specifically, PRGA generates the Yosys scripts for synthesizing an application for the custom FPGA, and the VPR scripts for placing and routing the synthesized application.
PRGA adopts a pass-based flow to complete, modify, optimize the FPGA
architecture as well as generate all files for the architecture. A Flow
object
is used to manage and run all the passes. It also checks and resolves the
dependences between the passes.
flow = Flow(
# This pass generates the architecture specification for VPR to place
# and route designs onto this FPGA
VPRArchGeneration("vpr/arch.xml"),
# This pass generates the routing resource graph specification for VPR
# to place and route designs onto this FPGA
VPR_RRG_Generation("vpr/rrg.xml"),
# This pass analyzes the primitives in the FPGA and generates synthesis
# script for Yosys
YosysScriptsCollection(r, "syn"),
)
# Run the flow on our context
flow.run(ctx)
After this step, PRGA should generate the following files:
+- syn/
| +- m_adder.lib.v # behavioral model for logic primitive "adder"
| +- m_adder.techmap.v # technology mapping rules for logic primitve "adder"
| |
| +- m_ram_1r1w.lib.v # behavioral model for the block RAM primitive
| +- memory.techmap.v # technology mapping rules for the block RAM primitive
| +- bram.rule # block RAM inference rules for Yosys
| |
| +- read_lib.tcl # Yosys script for reading in the primitives as lib cells
| +- synth.tcl # Yosys script for synthesizing an application
|
+- vpr/
+- arch.xml # VPR's architecture description
+- rrg.xml # VPR's routing resource graph
Auto-complete the architecture, generate RTL, and serialize the context¶
We have not yet chosen the programming protocol for the custom FPGA until this point in our script. This is intended to facilitate early and fast design-space exploration before diving into the vast physical optimization space.
To choose the programming protocol and then implement the abstract FPGA architecture with synthesizable RTL, run the following pases:
flow = Flow(
# This pass chooses the programming protocol, and adds protocol-specific
# designs into the context
Materialization("scanchain", chain_width = 1),
# This pass converts user-defined modules to Verilog modules
Translation(),
# Analyze how configurable connections are implemented with switches
SwitchPathAnnotation(),
# This pass inserts configuration circuitry into the FPGA
ProgCircuitryInsertion(),
# This pass create Verilog rendering tasks in the renderer.
VerilogCollection('rtl'),
)
# Run the flow on our context
flow.run(ctx)
After running the flow, all the models and information about our FPGA are stored in the context, and all the file are generated. As the final step, we make a persistent copy of the context by pickling it onto the disk. This pickled database will be used by the FPGA implementation toolchain, e.g. the bitstream assembler.
# Pickle the context
ctx.pickle("ctx.pkl")
Run the script¶
To run this Python script, you first need to enable the PRGA virtual environment
(see Run a Quick Test).
Then, you may either run the script directly with Python, or run make
inside
the examples/fpga/scanchain/fle6_N2_mem2K_8x8
directory.
You may also copy the script to any directory you like, and simply execute
python build.py
in there.