Hypergraph clustering is, in our view, one of the most crucial undocumented portions of Circuit Training.
The Methods section of the Nature paper provides the following information.
“(1) We group millions of standard cells into a few thousand clusters using hMETIS, a partitioning technique based on the minimum cut objective. Once all macros are placed, we use an FD method to place the standard cell clusters. Doing so enables us to generate an approximate but fast standard cell placement that facilitates policy network optimization.”
“Clustering of standard cells. To quickly place standard cells to provide a signal to our RL policy, we first cluster millions of standard cells into a few thousand clusters. There has been a large body of work on clustering for chip netlists. As has been suggested in the literature, such clustering helps not only with reducing the problem size, but also helps to ‘prevent mistakes’ (for example, prevents timing paths from being split apart). We also provide the clustered netlist to each of the baseline methods with which we compare. To perform this clustering, we employed a standard open-source library, hMETIS, which is based on multilevel hypergraph partitioning schemes with two important phases: (1) coarsening phase, and 2) uncoarsening and refinement phase.”
Therefore, at least one purpose of clustering is to enable fast placement of standard cells to provide a signal to the RL policy. The Methods section subsequently explains how the clusters are placed using a force-directed approach:
The Circuit Training FAQ adds:
Finally, the Methods section of the Nature paper also explains the provenance of the netlist hypergraph:
From the above information sources, the description of the Grouping process, and information provided by Google engineers, we are fairly certain of the following.
(1) Clustering uses the hMETIS partitioner, which is run in “multiway” mode. More specifically, hMETIS is always invoked with npart more than 500, with unit vertex weights. The hyperparameters given in Extended Data Table 3 of the Nature paper are used. (Additionally, Circuit Training explicitly sets reconst=1 and dbglvl=0.)
(2) The hypergraph that is fed to hMETIS consists of macros, macro pins, IO ports, and standard cells. The “fixed” file generated by Grouping process, is also fed as .fix input file to hMETIS.
(3) All hypergraph partitioning applications in physical design (of which we are aware) perform some kind of thresholding to ignore large hyperedges. Circuit Training ignores all hyperedges of size greater than 500.
Before going further, we provide a concrete example for (2).
Suppose that we have a design with 200,000 standard cells, 100 macros, and 1,000 IO ports.
Furthermore, using terms defined in Grouping, suppose that each of the 100 macros induces a cluster of 300 standard cells, and that the IO ports collectively induce 20 IO clusters, each of which induces a cluster of 50 standard cells.
Then, there will be 100 + 20 = 120 clusters. Each element (macro pin, IO port or standard cell) in these clusters corresponds to an entry of the .fix file. The cluster id starts from 0 to 119.
The number of individual standard cells in the hypergraph that is actually partitioned by hMETIS is 200,000 - (100 * 300) - (20 * 50) = 169,000.
Suppose that each macro has 64 macro pins. The hypergraph that is actually partitioned by hMETIS has 200,000 + 100 + 1000 + 100 * 64 = 207,500 vertices. Although there are both macro pins and macros in the hypergraph, all the nets related to macros are connected to macro pins and there are no hyperedges incident to macros. Each hyperedge in the hypergraph corresponds to a net in the netlist. Note that Circuit Training assumes that there is only one output pin for each standard cell, thus there is only one hyperedge {A, B, C, D, E} for the following case.
Figure 3. Illustration of net model used in Circuit Training.
After partitioning the hypergraph, we can have nparts clusters. Then Circuit Training breaks up clusters that span a distance larger than breakup_threshold. Here breakup_threshold = sqrt(canvas_width * canvas_height / 16). For each cluster c, the breakup process is as follows:
Figure 4. Illustration of breaking up a cluster.
Note that since the netlist is generated by physical-aware synthesis, we know the (x, y) coordinate for each instance.
After breaking up clusters which span large distance, there may be some small clusters with only tens of standard cells. In this step, Circuit Training recursively merges small clusters to the most adjacent cluster if they are within a certain distance closeness (breakup_threshold / 2.0), thus reducing number of clusters. A cluster is defined to be a small cluster if the number of elements (macro pins, macros, IO ports and standard cells) is less than or equal to max_num_nodes, where max_num_nodes = number_of_vertices // number_of_clusters_after_breakup // 4. The merging process is as following:
We call readers’ attention to the existence of significant aspects that are still pending clarification here.
While Gridding and
Grouping are hopefully well-understood,
we are still in the process of documenting and implementing such aspects as the following.
Pending clarification #1: Is the output netlist from synthesis modified before it enters (hypergraph clustering and) placement? All methodologies that span synthesis and placement (of which we are aware) must make a fundamental decision with respect to the netlist that is produced by logic synthesis, as that netlist is passed on to placement: (A) delete buffers and inverters to avoid biasing the ensuing placement (spatial embedding) with the synthesis tool’s fanout clustering, or (B) leave these buffers and inverters in the netlist to maintain netlist area and electrical rules (load, fanout) sensibility. We do not yet know Google’s choice in this regard. Our experimental runscripts will therefore support both (A) and (B).
[June 13] Update to Pending clarification #3: We are glad to see grouping (clustering) added to the Circuit Training GitHub. The new scripts refer to (x,y) coordinates of nodes in the netlist, which leads to further pending clarifications (noted here). The solution space for how the input to hypergraph clustering is obtained has expanded. A first level of options is whether (A) a non-physical synthesis tool (e.g., Genus, DesignCompiler or Yosys), or (B) a physical synthesis tool (e.g., Genus iSpatial or DesignCompiler Topological (Yosys cannot perform physical synthesis)), is used to obtain the netlist from starting RTL and constraints. In the regime of (B), to our understanding the commercial physical synthesis tools are invoked with a starting .def that includes macro placement. Thus, we plan to also enable a second level of sub-options for determining this macro placement: (B.1) use the auto-macro placement result from the physical synthesis tool, and (B.2) use a human PD expert (or, OpenROAD RTL-MP) macro placement. Some initial progress toward these clarifications has been posted as Our Progress.
Our implementation of hypergraph clustering takes the synthesized netlist and a .def file with placed IO ports as input, then generates the clustered netlist (in lef/def format) using hMETIS (1998 binary). In default mode, our implementation will generate the clustered netlist in protocol buffer format and cooresponding plc file. We implement the entire flow based on OpenROAD APIs. Please refer to the OpenROAD repo for explanation of each Tcl command.
Please note that The OpenROAD Project does not distribute any compiled binaries. You need to build your own OpenROAD binary before you run our scripts.
Input file: setup.tcl (you can follow the example to set up your own design) and FixFile (This file is generated by our Grouping scripts)
Output_files: the clustered netlist in protocol buffer format and cooresponding plc file.
Note that the example that we provide is the ariane design implemented in NanGate45. The netlist and corresponding def file with placed instances are generated by Genus iSpatial flow. Here the macro placement is automatically done by the Genus and Innovus tools, i.e., according to Flow (B.1) above.
We thank Google engineers for Q&A in a shared document, as well as live discussions on May 19, 2022, that explained the hypergraph clustering method used in Circuit Training. All errors of understanding and implementation are the authors’. We will rectify such errors as soon as possible after being made aware of them.