MacroPlacement

Code Elements in Circuit Training

The code elements below are the most crucial undocumented portions of Circuit Training. We thank Google engineers for Q&A in a shared document, as well as live discussions on May 19, 2022, that have explained aspects of several of the following code elements used in Circuit Training. All errors of understanding and implementation are the authors’. We will rectify such errors as soon as possible after being made aware of them.

We are glad to see grouping (clustering) added to the Circuit Training GitHub. However, these grouping (clustering) scripts still rely on the wrapper functions of plc_client, which is a black box for the community. In this doc, we document the implementation details of gridding, grouping and clustering. We implement all the code elements from scratch using python scripts, and our code produces results that exactly match those of Circuit Training.

Note that we build our implementation on top of the OpenROAD application. You will need to build your own OpenROAD binary before you can run our scripts. We also provide the flow.py, which runs Gridding, Grouping and Hypergraph Clustering in sequence.

Table of Contents

Gridding

In Circuit Training, the purpose of gridding is to control the size of the macro placement solution space, thus allowing RL to train within reasonable runtimes. Gridding enables hard macros to find locations consistent with high solution quality, while allowing soft macros (standard-cell clusters) to also find good locations.

Gridding determines a dissection of the layout canvas into some number of rows (n_rows) and some number of columns (n_cols) of gridcells.

The choice of n_rows and n_cols is made once for each design. Once the dimensions (n_rows, n_cols) have been chosen, their values define a gridded canvas, or grid, and remain fixed throughout Circuit Training for the given design. The detailed algorithm is as follows (Algorithm 1).

The gridding algorithm starts with the dimensions canvas_width and canvas_height of the layout canvas, as well as a list of macros, where each macro has a width and a height. Macros are not rotatable. The area of a macro is the product of its width and height. Then, the gridding searches over combinations (n_rows, n_cols), with constraints

where each gridcell has width of grid_w = canvas_width / n_cols and height of grid_h = canvas_height / n_row. The main idea is to search for a particular (n_rows, n_cols) combination that maximizes the metric related to wasted space.

To evaluate metric for a given grid (n_rows, n_cols), all macros are packed into the gridcells, and several terms (empty_ratio, ver_waste and hor_waste) that reflect wasted space are evaluated.

Packing

Macro packing is performed as follows [Algorithm 1 Lines 13-27]:

To calculate horizontal waste hor_waste, we calculate

Then, hor_waste = 1.0 - width_tot_macros / width_tot_used_gridcells.

To calculate vertical waste ver_waste, we calculate

Then, ver_waste = 1.0 - height_tot_macros / height_tot_used_gridcells.

After calculating empty_ratio, hor_waste and ver_waste, the metric is defined as metric = empty_ratio + 2.0 - hor_waste - ver_waste. The grid with best metric is noted as n_rows_opt and n_cols_opt.

Grid Simplification

Once we have found n_rows_opt and n_cols_opt as described above [Algorithm 1 Lines 34-44], we seek a smaller grid that has similar metric properties. [Algorithm 1 Lines 45-54]
Specifically, we find values of n_rows_actual and n_cols_actual such that its metric is within some tolerance (5\% in Circuit Training) of the optimal metric, and n_rows_actual * n_cols_actual is minimized. This is the grid that is used in Circuit Training. To our understanding, the foregoing procedure results in grids that are of similar sizes, e.g., with ~25 <= n_rows_actual , n_cols_actual <= ~40.

Grouping

Grouping is an important preprocessing step of clustering. The grouping step in Circuit Training requires as inputs: the post-synthesis gate-level netlist (standard cells and hard macros), placed IOs (ports, or terminals), typically at the borders of the chip canvas, and the grid of n_rows rows and n_cols columns of gridcells, which defines the gridded layout canvas. The purpose of grouping, to our understanding, is to ensure that closely-related standard-cell logic elements, which connect to the same macro or the same clump of IOs (denoted as an IO cluster), belong to the same standard-cell clusters.

The Grouping Process

The grouping consists of three steps:

Figure 1. Illustration of the netlist representation in Circuit Training.

Figure 2. Illustration of grouping IO ports.

How Groups Are Used

Each group is recorded in the “.fix file” that is part of the input to the hMETIS hypergraph partitioner when the gate-level netlist is clustered into soft macros.

How Grouping Scripts Are used

We provide (an example) about the usage of our grouping scripts. Basically, our grouping scripts take the following as inputs: (i) (setup_file) including enablement information (lefs/libs), synthesized gate-level netlist (.v), and def file with placed IOs (.def); (ii) n_rows and n_cols determined by the (Gridding) step; (iii) K_in and K_out parameters; and (iv) global_net_threshold for ignoring global nets. If a net has more than global_net_threshold instances, we ignore these nets when we search “transitive” fanins and fanouts. After running grouping scripts, you will get the .fix file.

Hypergraph clustering (soft macro definition)

Hypergraph clustering is, in our view, one of the most crucial undocumented portions of Circuit Training.

Information provided by Google.

The Methods section of the Nature paper provides the following information.

Therefore, at least one purpose of clustering is to enable fast placement of standard cells to provide a signal to the RL policy. The Methods section subsequently explains how the clusters are placed using a force-directed approach:

The Circuit Training FAQ adds:

Finally, the Methods section of the Nature paper also explains the provenance of the netlist hypergraph:

What exactly is the Hypergraph, and how is it partitioned?

From the above information sources, the description of the Grouping process, and information provided by Google engineers, we are fairly certain of the following.

Before going further, we provide a concrete example for (2).

Figure 3. Illustration of net model used in Circuit Training.

Break up clusters that span a distance larger than breakup_threshold

After partitioning the hypergraph, we can have nparts clusters. Then Circuit Training breaks up clusters that span a distance larger than breakup_threshold. Here breakup_threshold = sqrt(canvas_width * canvas_height / 16). For each cluster c, the breakup process is as follows:

Figure 4. Illustration of breaking up a cluster.

Note that since the netlist is generated by physical-aware synthesis, we know the (x, y) coordinate for each instance. This was confirmed in July 2022 by Google, here.

Recursively merge small adjacent clusters

After breaking up clusters which span over a large distance, there may be some small clusters with only tens of standard cells. In this step, Circuit Training recursively merges small clusters to the most adjacent cluster if they are within a certain distance closeness (breakup_threshold / 2.0), thus reducing the number of clusters. A cluster is defined to be a small cluster if the number of elements (macro pins, macros, IO ports and standard cells) is less than or equal to max_num_nodes, where max_num_nodes = number_of_vertices // number_of_clusters_after_breakup // 4. The merging process is as following:

Pending Clarifications

We call readers’ attention to the existence of significant aspects that are still pending clarification here.
While Gridding and Grouping are hopefully well-understood, we are still in the process of documenting and implementing such aspects as the following.

Our Implementation of Hypergraph Clustering.

Our implementation of hypergraph clustering takes the synthesized netlist and a .def file with placed IO ports as input, then generates the clustered netlist (in lef/def format) using hMETIS (1998 binary). In default mode, our implementation will generate the clustered netlist in protocol buffer format and corresponding plc file. We implement the entire flow based on OpenROAD APIs. Please refer to the OpenROAD repo for explanation of each Tcl command.

Please note that The OpenROAD Project does not distribute any compiled binaries. You need to build your own OpenROAD binary before you run our scripts.

Input file: setup.tcl (you can follow the example to set up your own design) and FixFile (This file is generated by our Grouping scripts)

Output_files: the clustered netlist in protocol buffer format and corresponding plc file.

Note that the example that we provide is the ariane design implemented in NanGate45. The netlist and corresponding def file with placed instances are generated by Genus iSpatial flow. Here the macro placement is automatically done by the Genus and Innovus tools, i.e., according to Flow (B.1) above.

Thanks

We thank Google engineers for Q&A in a shared document, as well as live discussions on May 19, 20

, that explained the hypergraph clustering method used in Circuit Training. All errors of understanding and implementation are the authors’. We will rectify such errors as soon as possible after being made aware of them.