MacroPlacement is an open, transparent effort to provide a public, baseline implementation of Google Brain’s Circuit Training (Morpheus) deep RL-based placement method. In this repo, we aim to achieve the following.
In order to achieve the above goals, our initial focus has been on the following efforts.
June 6 - Aug 5: We have developed and made publicly available the SP&R flow using commercial tools Cadence Genus and Innovus, and open-source tools Yosys and OpenROAD, for Ariane (two variants – one with 136 SRAMs and another with 133 SRAMs), MemPool tile and NVDLA designs on NanGate45, ASAP7 and SKY130HD open enablement. We applaud and thank Cadence Design Systems for allowing their tool runscripts to be shared openly by researchers, enabling reproducibility of results obtained via use of Cadence tools. This was an important milestone for the EDA research community. Please see Dr. David Junkin’s presentation at the recent DAC-2022 “Open-Source EDA and Benchmarking Summit” birds-of-a-feather meeting.
The following describes our learning related to testcase generation and its implementation using different tools on different platforms.
June 10: grouper.py was released in CircuitTraining. This revealed that protobuf input to the hypergraph clustering into soft macros included the (x,y) locations of the nodes. (A grouper.py script had been shown to Prof. Kahng during a meeting at Google on May 19.) The use of (x,y) locations from a physical synthesis tool was very unexpected, since it is not mentioned in “Methods” or other descriptions given in the Nature paper. We raised issue #25 to get clarification about this. [July 10: The README added to the grouping area of CircuitTraining confirmed that the input netlist has all of its nodes already placed.]
We currently use the physical synthesis tool Cadence Genus iSpatial to obtain (x,y) placed locations per instance as part of the input to Grouping. The Genus iSpatial post-physical-synthesis netlist is the starting point for how we produce the clustered netlist and the *.plc file which we provide as open inputs to CircuitTraining. From post-physical-synthesis netlist to clustered netlist generation can be divided into the following steps, which we have implemented as open-source in our CodeElements area:
June 22: We added our flow-scripts that run our gridding, grouping and clustering implementations to generate a final clustered netlist in protocol buffer format. Google’s netlist protocol buffer format documentation available in the CircuitTraining repo was very helpful to our understanding of how to convert a placed netlist to protobuf format. Our scripts enable clustered netlists in protobuf format to be produced from placed netlists in either LEF/DEF or Bookshelf format.
July 12: As stated in the “What is your timeline?” FAQ response [see also note [5] here], we presented progress to date in this MacroPlacement talk at the DAC-2022 “Open-Source EDA and Benchmarking Summit” birds-of-a-feather meeting.
July 26: Replication of the wirelength component of proxy cost. The wirelength is similar to HPWL where given a netlist, we take the width and height and sum them up for each net. One caveat is that for soft macro pins, there could be a weight factor which implies the total connections between the source and sink pins. If not defined, the default value is 1. This weight factor needs to be multiplied with the sum of width and height to replicate Google’s API. We provide the following table as a comparison between our implementations and Google’s API.
Testcase | Notes | Canvas width/height | Grid col/row | Our | |
---|---|---|---|---|---|
Ariane | Google’s Ariane | 356.592 / 356.640 | 35 / 33 | 0.7500626080261634 | 0.7500626224300161 |
Ariane133 | From MacroPlacement | 1599.99 / 1598.8 | 50 / 50 | 0.6522555375409593 | 0.6522555172428797 |
July 31: The netlist protocol buffer format documentation also helped us to write this Innovus-based tcl script which converts physical synthesized netlist to protobuf format in Innovus. [This script was written and developed by ABKGroup students at UCSD. However, the underlying commands and reports are copyrighted by Cadence. We thank Cadence for granting permission to share our research to help promote and foster the next generation of innovators.] We use this post-physical-synthesis protobuf netlist as input to the grouping code to generate the clustered netlist. Fixes that we made while running Google’s grouping code resulted in this [08/01/2022] pull request. [08/05/2022: Google’s grouping code has been updated based on this PR.]
July 22-August 4: We shared with Google engineers our (flat) post-physical-synthesis-protobuf netlist (ariane.pb.txt) of our Ariane design with 133 SRAMs on the NanGate45 platform, along with the corresponding clustered netlist and the legalized.plc file (clustered netlist: netlist.pb.txt) generated using the CircuitTraining grouping code. The goal here was to verify our steps and setup up to this point. Also, we provide scripts (using both our CodeElements and CT-grouping) to integrate the clustered netlist generation with the SP&R flow.
August 5: The following table compares the clustering results for Ariane133-NG45 design generated by the Google engineer (internally to Google) and the clustering results generated by us using CT grouping code.
Google Internal flow (from Google) | Our use of CT Grouping code | |
---|---|---|
Number of grid rows x columns | 21 x 24 | 21 x 24 |
Number of soft macros | 736 | 738 |
HPWL | 4171594.811 | 4179069.884 |
Wirelength cost | 0.072595 | 0.072197 |
Congestion cost | 0.727798 | 0.72853 |
August 11: We received information from Google that when a standard cell has multiple outputs, it merges all of them in the protobuf netlist (example: a full adder cell would have its outputs merged). The possible vertices of a hyperedge are macro pins, ports, and standard cells. Our Innovus-based protobuf netlist generation tcl script takes care of this.
August 15: We received information from Google engineers that in the proxy cost function, the density weight is set to 0.5 for their internal runs.
August 17: The proxy wirelength cost which is usually a value between 0 and 1, is related to the HPWL we computed earlier. We deduce the formulation as the following:
|netlist| is the total number of nets and it takes into account the weight factor defined on soft macro pins. Here is our proxy wirelength compared with Google’s API:
Testcase | Notes | Canvas width/height | Our | |
---|---|---|---|---|
Ariane | Google’s Ariane | 356.592 / 356.640 | 0.05018661999974192 | 0.05018662006439473 |
Ariane133 | From MacroPlacement | 1599.99 / 1598.8 | 0.04456188308735019 | 0.04456188299072617 |
Replication of the density component of proxy cost. We now have a verified density cost computation. Density cost computation depends on gridcell density. Gridcell density is the ratio of the total area occupied by standard cells, soft macros and hard macros to the total area of the grid. If there are cell overlaps then it may result in grid density greater than one. To get the density cost, we take the average of the top 10% of the densest gridcells. Before outputting it, we multiply it by 0.5. Notice that this 0.5 is not the “weight” of this cost function, but simply another factor applied besides the weight factor from the cost function.
Testcase | Notes | Canvas width/height | Grid col/row | Our | |
---|---|---|---|---|---|
Ariane | Google’s Ariane | 356.592 / 356.640 | 35 / 33 | 0.7500626080261634 | 0.7500626224300161 |
Ariane133 | From MacroPlacement | 1599.99 / 1598.8 | 50 / 50 | 0.6522555375409593 | 0.6522555172428797 |
August 18: The flat post-physical-synthesis protobuf netlist of Ariane133-NanGate45 design is used as input to CT grouping code to generate the clustered netlist. We then use this clustered netlist in Circuit Training. Coordinate Descent is (by default) not applied to any macro placement solution. Here is the link to our tensorboard. We ran Innovus P&R starting from the macro placement generated using CT, through the end of detailed routing (RouteOpt) and collection of final PPA / “Table 1” metrics. Following are the metrics and screen shots of the P&R database. Throughout the SP&R flow, the target clock period is 4ns. The power grid overhead is 18.46% in the actual P&R setup, matching the 18% mentioned in the Circuit Training repo. All results are for DRC-clean final routing produced by the Innovus tool.
[In the immediately-following content, we also show comparison results using other macro placement methods, collected since August 18.]
[As of August 24 onward, we refer to this testcase as “Our Ariane133-NanGate45_51” since it has 51% area utilization. A second testcase, “Our Ariane133-NanGate45_68”, has 68% area utilization which exactly matches that of the Ariane in Circuit Training.]
Macro placement generated by Circuit Training on Our Ariane-133 (NG45), with post-macro placement flow using Innovus21.1 |
|||||||||
---|---|---|---|---|---|---|---|---|---|
Physical Design Stage | Core Area (um^2) | Standard Cell Area (um^2) | Macro Area (um^2) | Total Power (mW) | Wirelength (um) | WS (ns) |
TNS (ns) |
Congestion (H) | Congestion (V) |
preCTS | 2560080 | 214555 | 1018356 | 287.79 | 4343214 | 0.005 | 0 | 0.01% | 0.02% |
postCTS | 2560080 | 216061 | 1018356 | 301.31 | 4345969 | 0.010 | 0 | 0.01% | 0.02% |
postRoute | 2560080 | 216061 | 1018356 | 300.38 | 4463660 | 0.359 | 0 |
Comparison 1: “Human Gridded”. For comparison, a baseline “human, gridded” macro placement was generated by a human for the same canvas size, I/O placement and gridding, with results as follows.
Macro placement generated by a human on Our Ariane-133 (NG45), with post-macro placement flow using Innovus21.1 |
|||||||||
---|---|---|---|---|---|---|---|---|---|
Physical Design Stage | Core Area (um^2) | Standard Cell Area (um^2) | Macro Area (um^2) | Total Power (mW) | Wirelength (um) | WS (ns) |
TNS (ns) |
Congestion (H) | Congestion (V) |
preCTS | 2560080 | 215188.9 | 1018356 | 285.96 | 4470832 | -0.002 | -0.005 | 0.00% | 0.00% |
postCTS | 2560080 | 216322.9 | 1018356 | 299.62 | 4472866 | 0.001 | 0 | 0.00% | 0.00% |
postRoute | 2560080 | 216322.9 | 1018356 | 298.60 | 4587141 | 0.284 | 0 |
Comparison 2: RePlAce. The standalone RePlAce placer was run on the same (flat) netlist with the same canvas size and I/O placement, with results as follows.
Macro placement generated by RePlAce (standalone, from HERE) on Our Ariane-133 (NG45), with post-macro placement flow using Innovus21.1 |
|||||||||
---|---|---|---|---|---|---|---|---|---|
Physical Design Stage | Core Area (um^2) | Standard Cell Area (um^2) | Macro Area (um^2) | Total Power (mW) | Wirelength (um) | WS (ns) |
TNS (ns) |
Congestion (H) | Congestion (V) |
preCTS | 2560080 | 214910.71 | 1018356 | 288.654 | 4178509 | 0.003 | 0 | 0.03% | 0.07% |
postCTS | 2560080 | 216006.63 | 1018356 | 302.013 | 4184690 | 0.007 | 0 | 0.05% | 0.08% |
postRoute | 2560080 | 216006.63 | 1018356 | 301.260 | 4315157 | -0.207 | -0.41 |
Comparison 3: RTL-MP. The RTL-MP macro placer described in this ISPD-2022 paper and used as the default macro placer in OpenROAD was run on the same (flat) netlist with the same canvas size and I/O placement, with results as follows.
Macro placement generated using RTL-MP on Our Ariane-133 (NG45), with post-macro placement flow using Innovus21.1 |
|||||||||
---|---|---|---|---|---|---|---|---|---|
Physical Design Stage | Core Area (um^2) | Standard Cell Area (um^2) | Macro Area (um^2) | Total Power (mW) | Wirelength (um) | WS (ns) |
TNS (ns) |
Congestion (H) | Congestion (V) |
preCTS | 2560080 | 216420.26 | 1018356 | 289.435 | 5164199 | 0.020 | 0 | 0.04% | 0.05% |
postCTS | 2560080 | 217938.32 | 1018356 | 303.757 | 5185004 | 0.001 | 0 | 0.05% | 0.07% |
postRoute | 2560080 | 217938.32 | 1018356 | 302.844 | 5306735 | 0.104 | 0 |
Comparison 4: The Hier-RTLMP macro placer was run on the same (flat) netlist with the same canvas size and I/O placement, with results as follows. [The Hier-RTLMP paper is in submission as of August 2022; availability in OpenROAD and OpenROAD-flow-scripts is planned by end of September 2022. Please email abk@eng.ucsd.edu if you would like a preprint, not for further redistribution.]
Macro placement generated using Hier-RTLMP on Our Ariane-133 (NG45), with post-macro placement flow using Innovus21.1 |
|||||||||
---|---|---|---|---|---|---|---|---|---|
Physical Design Stage | Core Area (um^2) | Standard Cell Area (um^2) | Macro Area (um^2) | Total Power (mW) | Wirelength (um) | WS (ns) |
TNS (ns) |
Congestion (H) | Congestion (V) |
preCTS | 2560080 | 214783.83 | 1018356 | 288.356 | 4397005 | 0.005 | 0 | 0.02% | 0.05% |
postCTS | 2560080 | 215911.67 | 1018356 | 302.176 | 4419305 | 0.009 | 0 | 0.04% | 0.06% |
postRoute | 2560080 | 215911.67 | 1018356 | 301.468 | 4537458 | 0.311 | 0 |
August 20: Matching the area utilization. We revisited the area utilization of Our Ariane133 and realized that it (51%) is lower than that of Google’s Ariane (68%). So that this would not devalue our study, we created a second variant, “Our Ariane133-NanGate45_68”, which matches the area utilization of Google’s Ariane. Results are as given below.
Physical Design Stage | Core Area (um^2) | Standard Cell Area (um^2) | Macro Area (um^2) | Total Power (mW) | Wirelength (um) | WS (ns) |
TNS (ns) |
Congestion (H) | Congestion (V) |
preCTS | 1814274 | 215575.444 | 1018355.73 | 288.762 | 4170253 | 0.002 | 0 | 0.01% | 0.01% |
postCTS | 1814274 | 217114.520 | 1018355.73 | 302.607 | 4186888 | 0.001 | 0 | 0.00% | 0.01% |
postRoute | 1814274 | 217114.520 | 1018355.73 | 301.722 | 4295572 | 0.336 | 0 |
Comparison 1: “Human Gridded”. For comparison, a baseline “human, gridded” macro placement was generated by a human for the same canvas size, I/O placement and gridding.
Macro Placement generated by human (Util: 68%) |
|||||||||
---|---|---|---|---|---|---|---|---|---|
Physical Design Stage | Core Area (um^2) | Standard Cell Area (um^2) | Macro Area (um^2) | Total Power (mW) | Wirelength (um) | WS (ns) |
TNS (ns) |
Congestion (H) | Congestion (V) |
preCTS | 1814274 | 215779 | 1018355.73 | 289.999 | 4545632 | -0.003 | -0.004 | 0.09% | 0.15% |
postCTS | 1814274 | 217192 | 1018355.73 | 303.786 | 4571293 | 0.001 | 0 | 0.13% | 0.16% |
postRoute | 1814274 | 217192 | 1018355.73 | 302.725 | 4720776 | 0.206 | 0 |
Comparison 2: RePlAce. The standalone RePlAce placer was run on the same (flat) netlist with the same canvas size and I/O placement, with results as follows.
Macro Placement generated Using RePlAce (Util: 68%) |
|||||||||
---|---|---|---|---|---|---|---|---|---|
Physical Design Stage | Core Area (um^2) | Standard Cell Area (um^2) | Macro Area (um^2) | Total Power (mW) | Wirelength (um) | WS (ns) |
TNS (ns) |
Congestion (H) | Congestion (V) |
preCTS | 1814274 | 217246 | 1018355.73 | 292.803 | 4646408 | -0.007 | -0.011 | 0.07% | 0.13% |
postCTS | 1814274 | 218359 | 1018355.73 | 306.145 | 4657174 | 0.001 | 0 | 0.07% | 0.17% |
postRoute | 1814274 | 218359 | 1018355.73 | 305.032 | 4809950 | 0.082 | 0 |
Comparison 3: RTL-MP. The RTL-MP macro placer was run on the same (flat) netlist with the same canvas size and I/O placement, with results as follows.
Macro Placement generated Using RTL-MP (Util: 68%) |
|||||||||
---|---|---|---|---|---|---|---|---|---|
Physical Design Stage | Core Area (um^2) | Standard Cell Area (um^2) | Macro Area (um^2) | Total Power (mW) | Wirelength (um) | WS (ns) |
TNS (ns) |
Congestion (H) | Congestion (V) |
preCTS | 1814274 | 217057 | 1018355.73 | 292.800 | 4598656 | -0.001 | -0.001 | 0.00% | 0.01% |
postCTS | 1814274 | 218045 | 1018355.73 | 306.475 | 4614827 | 0.007 | 0 | 0.00% | 0.01% |
postRoute | 1814274 | 218045 | 1018355.73 | 303.380 | 4745004 | 0.294 | 0 |
Comparison 4: The Hier-RTLMP macro placer was run on the same (flat) netlist with the same canvas size and I/O placement, using two setups, with results as follows.
Macro Placement generated Using Hier-RTLMP (Util: 68%) [Setup 1] |
|||||||||
---|---|---|---|---|---|---|---|---|---|
Physical Design Stage | Core Area (um^2) | Standard Cell Area (um^2) | Macro Area (um^2) | Total Power (mW) | Wirelength (um) | WS (ns) |
TNS (ns) |
Congestion (H) | Congestion (V) |
preCTS | 1814274 | 218096 | 1018355.73 | 294.035 | 4967286 | 0.003 | 0 | 0.10% | 0.12% |
postCTS | 1814274 | 219150 | 1018355.73 | 308.130 | 4984385 | 0.001 | 0 | 0.13% | 0.13% |
postRoute | 1814274 | 219150 | 1018355.73 | 307.103 | 5137430 | 0.387 | 0 |
Macro Placement generated Using Hier-RTLMP (Util: 68%) [Setup 2] |
|||||||||
---|---|---|---|---|---|---|---|---|---|
Physical Design Stage | Core Area (um^2) | Standard Cell Area (um^2) | Macro Area (um^2) | Total Power (mW) | Wirelength (um) | WS (ns) |
TNS (ns) |
Congestion (H) | Congestion (V) |
preCTS | 1814274 | 216665 | 1018355.73 | 291.332 | 4917102 | 0.001 | 0 | 0.02% | 0.06% |
postCTS | 1814274 | 217995 | 1018355.73 | 305.089 | 4931432 | 0.001 | 0 | 0.03% | 0.05% |
postRoute | 1814274 | 217995 | 1018355.73 | 303.905 | 5048575 | 0.230 | 0 |
August 25: Replication of the congestion component of proxy cost. Reverse-engineering from the plc client API is finally completed, as described here. A review with Dr. Mustafa Yazgan was very helpful in confirming the case analysis and conventions identified during reverse-engineering. Replication results are shown below. With this, reproduction in open source code of the Circuit Training proxy cost has been completed. Note that the description here illustrates how the Nature paper, Circuit Training, and Google engineers’ versions can have minor discrepancies. (These minor discrepancies are not currently viewed as substantive, i.e., meaningfully affecting our ongoing assessment.) For example, to calculate the congestion component, the H- and V-routing congestion cost lists are concatenated, and the ABU5 (average of top 5% of the concatenated list) metric of this list is the congestion cost. By contrast, the Nature paper indicates use of an ABU10 metric. Recall: “There is no substitute for source code.”
Name | Description | Canvas Size | Col/Row | Congestion Smoothing | Google’s Congestion | Our Congestion |
---|---|---|---|---|---|---|
Ariane | Google’s Ariane | 356.592 / 356.640 | 35 / 33 | 0 | 3.385729893179586 | 3.3857299314069733 |
Ariane133 | Our Ariane | 1599.99 / 1600.06 | 24 / 21 | 0 | 1.132108622298701 | 1.1321086382282062 |
Ariane | Google’s Ariane | 356.592 / 356.640 | 35 / 33 | 1 | 2.812822828059799 | 2.81282287498789 |
Ariane133 | Our Ariane | 1599.99 / 1600.06 | 24 / 21 | 1 | 1.116203573147857 | 1.1162035989647672 |
Ariane | Google’s Ariane | 356.592 / 356.640 | 35 / 33 | 2 | 2.656602005772668 | 2.6566020148393146 |
Ariane133 | Our Ariane | 1599.99 / 1600.06 | 24 / 21 | 2 | 1.109241385529823 | 1.1092414113467333 |
August 26: Moving on to understand benefits and limitations of the Circuit Training methodology itself. This next stage of study is enabled by confidence in the technical solidity of what has been accomplished so far – again, with the help of Google engineers.
Question 1. How does having an initial set of placement locations (from physical synthesis) affect the (relative) quality of the CT result?
A preliminary exercise has compared outcomes when the Genus iSpatial (x,y) coordinates are given, versus when vacuous (x,y) coordinates are given. The following CT result is for the “Our Ariane133-NanGate45_68” example where the input protobuf netlist to Circuit Training’s grouping code has all macro and standard cell locations set to (600, 600). This is just an exercise for now: other, carefully-designed experiments will be performed over the coming weeks and months.
Macro Placement generated using CT (Util: 68%) with a vacuous set of input (x,y) coordinates. The input protobuf netlist to Circuit Training’s grouping code has all macro and standard cell locations set to (600, 600). |
|||||||||
---|---|---|---|---|---|---|---|---|---|
Physical Design Stage | Core Area (um^2) | Standard Cell Area (um^2) | Macro Area (um^2) | Total Power (mW) | Wirelength (um) | WS (ns) |
TNS (ns) |
Congestion (H) | Congestion (V) |
preCTS | 1814274 | 216069 | 1018355.73 | 290.0818 | 4615961 | -0.004 | -0.021 | 0.01% | 0.03% |
postCTS | 1814274 | 217118 | 1018355.73 | 303.7199 | 4619727 | 0 | 0 | 0.01% | 0.02% |
postRoute | 1814274 | 217118 | 1018355.73 | 302.4018 | 4738717 | 0.171 | 0 |
Update to Question 1 on September 9: Two additional vacuous placements were run through the CT flow.
The following table and screenshots show results for the (0, 0) vacuous placement.
Macro Placement generated using CT (Util: 68%) with a vacuous set of input (x,y) coordinates. The input protobuf netlist to Circuit Training’s grouping code has all macro and standard cell locations set to (0, 0). |
|||||||||
---|---|---|---|---|---|---|---|---|---|
Physical Design Stage | Core Area (um^2) | Standard Cell Area (um^2) | Macro Area (um^2) | Total Power (mW) | Wirelength (um) | WS (ns) |
TNS (ns) |
Congestion (H) | Congestion (V) |
preCTS | 1814274 | 215520 | 1018356 | 289.676 | 4489121 | -0.006 | -0.007 | 0.02% | 0.09% |
postCTS | 1814274 | 216891 | 1018356 | 302.551 | 4495430 | 0.005 | 0 | 0.02% | 0.10% |
postRoute | 1814274 | 216891 | 1018356 | 301.322 | 4606716 | 0.218 | 0 |
The following table and screenshots show results for (max_x, max_y), where max_x = 1347.1 and max_y = 1346.8.
Macro Placement generated using CT (Util: 68%) with a vacuous set of input (x,y) coordinates. The input protobuf netlist to Circuit Training’s grouping code has all macro and standard cell locations set to (max_x, max_y) = (1347.1, 1346.8) |
|||||||||
---|---|---|---|---|---|---|---|---|---|
Physical Design Stage | Core Area (um^2) | Standard Cell Area (um^2) | Macro Area (um^2) | Total Power (mW) | Wirelength (um) | WS (ns) |
TNS (ns) |
Congestion (H) | Congestion (V) |
preCTS | 1814274 | 214817 | 1018356 | 288.454 | 4530507 | 0.002 | 0 | 0.01% | 0.04% |
postCTS | 1814274 | 215844 | 1018356 | 301.719 | 4532853 | 0.007 | 0 | 0.03% | 0.05% |
postRoute | 1814274 | 215844 | 1018356 | 300.763 | 4646396 | 0.228 | 0 |
Question 2. How does utilization affect the (relative) performance of CT?
Question 3. Is a testcase such as Ariane-133 “probative”, or do we need better testcases?
A preliminary exercise has examined Innovus P&R outcomes when the Circuit Training macro placement locations for Our Ariane133-NanGate45_68 are randomly shuffled. The results for four seed values used in the shuffle, and for the original Circuit Training result, are as follows. (We have extended this experiment here.)
Metric | Shuffle-1 | Shuffle-2 | Shuffle-3 | Shuffle-4 | CT_Result |
---|---|---|---|---|---|
Core_area (um^2) | 1814274.28 | 1814274.28 | 1814274.28 | 1814274.28 | 1814274.28 |
Macro_area (um^2) | 1018355.73 | 1018355.73 | 1018355.73 | 1018355.73 | 1018355.73 |
preCTS_std_cell_area (um^2) | 217124.89 | 217168.25 | 217157.88 | 217020.09 | 215575.44 |
postCTS_std_cell_area (um^2) | 218215.23 | 218231.19 | 218328.81 | 218073.45 | 217114.52 |
postRoute_std_cell_area (um^2) | 218215.23 | 218231.19 | 218328.81 | 218073.45 | 217114.52 |
preCTS_total_power (mW) | 292.032 | 292.692 | 292.676 | 292.764 | 288.762 |
postCTS_total_power (mW) | 305.726 | 306.497 | 306.120 | 306.524 | 302.607 |
preRoute_total_power (mW) | 304.394 | 304.996 | 304.711 | 305.093 | 301.722 |
preCTS_wirelength (um) | 5057900 | 5069848 | 5092665 | 5119539 | 4170253 |
postCTS_wirelength (um) | 5063278 | 5079451 | 5109801 | 5126540 | 4186888 |
postRoute_wirelength (um) | 5186032 | 5194397 | 5227411 | 5247799 | 4295572 |
preCTS_WS (ns) | -0.006 | 0.001 | 0 | -0.003 | 0.002 |
postCTS_WS (ns) | 0.002 | 0.002 | 0.003 | 0.002 | 0.001 |
postRoute_WS (ns) | 0.174 | 0.090 | 0.219 | 0.349 | 0.336 |
preCTS_TNS (ns) | -0.010 | 0 | 0 | -0.019 | 0 |
postCTS_TNS (ns) | 0 | 0 | 0 | 0 | 0 |
postRoute_TNS (ns) | 0 | 0 | 0 | 0 | 0 |
preCTS_Congestion(H) | 0.02% | 0.02% | 0.03% | 0.02% | 0.01% |
postCTS_Congestion(H) | 0.03% | 0.04% | 0.02% | 0.06% | 0.00% |
postRoute_Congestion(H) | |||||
preCTS_Congestion(V) | 0.06% | 0.06% | 0.07% | 0.07% | 0.01% |
postCTS_Congestion(V) | 0.07% | 0.07% | 0.08% | 0.08% | 0.01% |
postRoute_Congestion(V) |
September 9:
Question 4. How much does the guidance to clustering that comes from (x,y) locations matter?
We answer this by using hMETIS to generate the same number of soft macros from the same netlist, but only via the npart (number of partitions) parameter. The value of npart in the call to hMETIS is chosen to match the number of standard-cell clusters (i.e., soft macros) obtained in the CT grouping process. Then, to preserve this number of soft macros, we skip the break up and merge stage in CT grouping.
[Brief overview of break up and merge: (A) Break up: During break up, if a standard cell cluster height or width is greater than sqrt(canvas area / 16), then it is broken into small clusters such that the height and width of each cluster is less than sqrt(canvas area / 16). (B) Merge: During merge, if the number of standard cells is less than the (average number of standard cells in a cluster / 4), then the standard cells of that cluster are moved to their neighboring clusters.]
We run hMETIS with npart = 810 (number of fixed groups is 153) to match the total number of standard cell clusters when CT’s break up and merge is run. The following table presents the results of this experiment. Outcomes are similar to the original Ariane133-NG45 with 68% utilization CT result. [The Question 1 study indicates that a vacuous placement harms the outcome of CT, i.e., “placement information matters”. But the Question 4 study suggests that a flow that does not bring in any placement coordinates (i.e., using pure hMETIS partitioning down to a similar number of stdcell clusters) does not affect results by much.]
Macro Placement generated using CT (Util: 68%) when the input clustered netlist is generated by running hMETIS npart = 810 and without running break up and merge |
|||||||||
---|---|---|---|---|---|---|---|---|---|
Physical Design Stage | Core Area (um^2) | Standard Cell Area (um^2) | Macro Area (um^2) | Total Power (mW) | Wirelength (um) | WS (ns) |
TNS (ns) |
Congestion (H) | Congestion (V) |
preCTS | 1814274 | 215552 | 1018356 | 288.642 | 4188406 | -0.001 | -0.001 | 0.02% | 0.12% |
postCTS | 1814274 | 216618 | 1018356 | 302.086 | 4196172 | 0.002 | 0 | 0.02% | 0.11% |
postRoute | 1814274 | 216618 | 1018356 | 300.899 | 4304113 | 0.264 | 0 |
Question 5. What is the impact of the Coordinate Descent (CD) placer on proxy cost and Table 1 metric?
In our August 18 notes, we mentioned that the default CT flow does NOT run coordinate descent. (Coordinate descent is not mentioned in the Nature paper.) The result in the CT repo shows the impact of Coordinate Descent (CD) on proxy cost for the Google Ariane design, but there is no data to show the impact of CD on Table 1 metrics.
We have taken the CT results generated for Ariane133-NG45 with 68% utilization through the CD placement step. The following table shows the effect of CD placer on proxy cost. The CD placer for this instance improves proxy wirelength and density at the cost of congestion, and overall proxy cost degrades slightly.
CD Placer effect on Proxy cost for Ariane133 |
||
---|---|---|
Cost | CT w/o CD | + Apply CD |
Wirelength | 0.0948 | 0.0861 |
Density | 0.4845 | 0.4746 |
Congestion | 0.7176 | 0.7574 |
Proxy | 0.6959 | 0.7021 |
The following table shows the P&R result for the post-CD macro placement.
Macro placement generated by applying the Coordinate Descent placement step to Our Ariane-133 (NG45) 68% utilization when the input to the CD placer is the (default setup) CT macro placement. The post-macro placement flow uses Innovus21.1 |
|||||||||
---|---|---|---|---|---|---|---|---|---|
Physical Design Stage | Core Area (um^2) | Standard Cell Area (um^2) | Macro Area (um^2) | Total Power (mW) | Wirelength (um) | WS (ns) |
TNS (ns) |
Congestion (H) | Congestion (V) |
preCTS | 1814274 | 215581 | 1018356 | 289.312 | 4238854 | -0.001 | -0.003 | 0.01% | 0.06% |
postCTS | 1814274 | 217017 | 1018356 | 302.483 | 4249846 | 0.005 | 0 | 0.02% | 0.07% |
postRoute | 1814274 | 217017 | 1018356 | 301.482 | 4358888 | 0.140 | 0 |
Even though CD improves proxy wirelength, the post-route wirelength worsens slightly (by ~1.47%) compared to the original CT macro placement.
Question 6. Are we using the industry tool in an “expert” manner? (We believe so.) We received an inquiry regarding the multiple ways in which macro placements could be obtained using Cadence tooling. To clarify:
Macro placement generated by Circuit Training on Our Ariane-133 (NG45) 68% utilization when the input macro and standard cell placement to CT grouping is generated by Genus iSpatial, and the post-macro placement flow is using Innovus21.1 |
|||||||||
---|---|---|---|---|---|---|---|---|---|
Physical Design Stage | Core Area (um^2) | Standard Cell Area (um^2) | Macro Area (um^2) | Total Power (mW) | Wirelength (um) | WS (ns) |
TNS (ns) |
Congestion (H) | Congestion (V) |
preCTS | 1814274 | 215583 | 1018355.73 | 289.030 | 4476331 | -0.002 | -0.002 | 0.02% | 0.03% |
postCTS | 1814274 | 216729 | 1018355.73 | 302.268 | 4483560 | 0.002 | 0 | 0.03% | 0.09% |
postRoute | 1814274 | 216729 | 1018355.73 | 301.028 | 4590581 | 0.316 | 0 |
Question 7. What happens if we skip CT and continue directly to standard-cell P&R (i.e., the Innovus 21.1 flow) once we have a macro placement from the commercial tool?
At some point during the past weeks, we realized that this would also be a potential “baseline” for comparison. As can be seen below for both 68% and 51% variants of Ariane-133 in NG45, omitting the CT step can also produce good results by the Table 1 metrics. At this point, we do not have any diagnosis or interpretation of this data. One possible implication is that the Ariane-133 testcase is in some way not probative. The community’s suggestions (e.g., alternate testcases, constraints, floorplan setup, etc.) are always welcome.
Concurrent macro placement (Ariane 68%) continuing straight into the Innovus 21.1 P&R flow (no application of Circuit Training) [baseline CT result: here] |
|||||||||
---|---|---|---|---|---|---|---|---|---|
Physical Design Stage | Core Area (um^2) |
Standard Cell Area (um^2) |
Macro Area (um^2) |
Total Power (mW) |
Wirelength (um) |
WS (ns) |
TNS (ns) |
Congestion (H) |
Congestion (V) |
preCTS | 1814274 | 214050 | 1018355.73 | 286.117 | 3656436 | 0.007 | 0 | 0.02% | 0.01% |
postCTS | 1814274 | 215096 | 1018355.73 | 299.438 | 3662225 | 0.01 | 0 | 0.01% | 0.02% |
postRoute | 1814274 | 215096 | 1018355.73 | 298.934 | 3780153 | 0.285 | 0 | ||
Concurrent macro placement (Ariane 51%) continuing straight into the Innovus 21.1 P&R flow (no application of Circuit Training) [baseline CT result: here] |
|||||||||
Physical Design Stage | Core Area (um^2) |
Standard Cell Area (um^2) | Macro Area (um^2) | Total Power (mW) | Wirelength (um) |
WS (ns) |
TNS (ns) |
Congestion (H) |
Congestion (V) |
preCTS | 2560080 | 214060 | 1018355.73 | 285.509 | 3647997 | 0.047 | 0 | 0.00% | 0.00% |
postCTS | 2560080 | 215117 | 1018355.73 | 298.362 | 3649940 | 0.011 | 0 | 0.00% | 0.01% |
postRoute | 2560080 | 215117 | 1018355.73 | 297.849 | 3764148 | 0.210 | 0 |
Ariane 68%:
Question 8. How does the tightness of timing constraints affect the (relative) performance of CT?
[Comment: This is related to Question 2, and is part of the broad question of field of use / sweet spot. We still intend to work in the space of {design testcase} X {technology and design enablement} X {utilization} X {performance requirement}X experimental {questions, design/setup, execution} to reach conclusions that are above the bar of “satisfying readers”. Progress will continue to be reported here and in GitHub.]
We have trained CT to generate a macro placement for the NVDLA design. For this experiment we use the NanGate45 enablement; the initial canvas size is generated by setting utilization to 68%. We use the default hyperparameters used for Ariane to train CT for NVDLA design. The number of hard macros in NVDLA is 128, so we update max_sequnece_length to 129 in ppo_collect.py and sequence_length to 129 in train_ppo.py.
The following table and screenshots show the CT result.
Macro placement generated by Circuit Training on Our NVDLA (NG45) 68% utilization, post-macro placement flow using Innovus21.1 |
|||||||||
---|---|---|---|---|---|---|---|---|---|
Physical Design Stage | Core Area (um^2) | Standard Cell Area (um^2) | Macro Area (um^2) | Total Power (mW) | Wirelength (um) | WS (ns) |
TNS (ns) |
Congestion (H) | Congestion (V) |
preCTS | 4002458 | 401713 | 2325683 | 2428.453 | 13601973 | -0.003 | -0.045 | 0.40% | 1.22% |
postCTS | 4002458 | 404398 | 2325683 | 2514.685 | 13677780 | -0.009 | -0.027 | 0.44% | 1.54% |
postRoute | 4002458 | 404398 | 2325683 | 2491.368 | 14317085 | 0.142 | 0 |
Ariane133-NG45-68%-4.0ns CMP (Link to CT result) |
|||||||||
---|---|---|---|---|---|---|---|---|---|
Physical Design Stage | Core Area (um^2) |
Standard Cell Area (um^2) | Macro Area (um^2) | Total Power (mW) | Wirelength (um) |
WS (ns) |
TNS (ns) |
Congestion (H) | Congestion (V) |
preCTS | 1814274 | 215033 | 1018356 | 286.199 | 3535026 | -0.001 | -0.001 | 0.04% | 0.01% |
postCTS | 1814274 | 216147 | 1018356 | 299.635 | 3544668 | 0.001 | 0 | 0.02% | 0.01% |
postRoute | 1814274 | 216147 | 1018356 | 299.110 | 3649892 | 0.317 | 0 | ||
postRouteOpt | 1814274 | 215738 | 1018356 | 295.127 | 3653200 | 0.397 | 0 | ||
Ariane133-NG45-68%-1.5ns CMP (Link to CT result] |
|||||||||
Physical Design Stage | Core Area (um^2) |
Standard Cell Area (um^2) | Macro Area (um^2) | Total Power (mW) | Wirelength (um) |
WS (ns) |
TNS (ns) |
Congestion (H) | Congestion (V) |
preCTS | 1814274 | 232370 | 1018356 | 682.777 | 3635909 | -0.008 | -0.143 | 0.01% | 0.01% |
postCTS | 1814274 | 234250 | 1018356 | 718.592 | 3663001 | -0.002 | -0.006 | 0.03% | 0.10% |
postRoute | 1814274 | 234250 | 1018356 | 717.410 | 3777403 | -0.221 | -86.88 | ||
postRouteOpt | 1814274 | 237178 | 1018356 | 718.866 | 3785973 | -0.042 | -6.311 | ||
Ariane133-NG45-68%-1.3ns CMP (Link to CT result) |
|||||||||
Physical Design Stage | Core Area (um^2) |
Standard Cell Area (um^2) | Macro Area (um^2) | Total Power (mW) | Wirelength (um) |
WS (ns) |
TNS (ns) |
Congestion (H) | Congestion (V) |
preCTS | 1814274 | 251874 | 1018356 | 807.994 | 3885279 | -0.15 | -242.589 | 0.02% | 0.02% |
postCTS | 1814274 | 254721 | 1018356 | 851.977 | 3923912 | -0.127 | -133.426 | 0.04% | 0.10% |
postRoute | 1814274 | 254721 | 1018356 | 850.483 | 4049905 | -0.239 | -410.578 | ||
postRouteOpt | 1814274 | 256230 | 1018356 | 851.546 | 4057140 | -0.154 | -196.527 |
Ariane133-NG45-68%-1.5ns CT (Link to CMP result) |
|||||||||
---|---|---|---|---|---|---|---|---|---|
Physical Design Stage | Core Area (um^2) |
Standard Cell Area (um^2) | Macro Area (um^2) | Total Power (mW) | Wirelength (um) |
WS (ns) |
TNS (ns) |
Congestion (H) | Congestion (V) |
preCTS | 1814274 | 227917 | 1018356 | 673.158 | 4243883 | -0.012 | -0.648 | 0.03% | 0.03% |
postCTS | 1814274 | 229836 | 1018356 | 708.797 | 4247346 | -0.001 | -0.007 | 0.07% | 0.12% |
postRoute | 1814274 | 229836 | 1018356 | 707.522 | 4360419 | -0.052 | -9.218 | ||
postRouteOpt | 1814274 | 230164 | 1018356 | 707.829 | 4364537 | -0.009 | -0.233 |
Ariane133-NG45-68%-1.3ns CT (Link to CMP result) |
|||||||||
---|---|---|---|---|---|---|---|---|---|
Physical Design Stage | Core Area (um^2) |
Standard Cell Area (um^2) | Macro Area (um^2) | Total Power (mW) | Wirelength (um) |
WS (ns) |
TNS (ns) |
Congestion (H) | Congestion (V) |
postSynth | 1814274 | 244614 | 1018356 | 761.754 | 4884882 | -0.764 | -533.519 | ||
preCTS | 1814274 | 244373 | 1018356 | 792.626 | 4732895 | -0.123 | -184.135 | 0.03% | 0.11% |
postCTS | 1814274 | 247965 | 1018356 | 837.464 | 4762751 | -0.084 | -35.57 | 0.04% | 0.15% |
postRoute | 1814274 | 247965 | 1018356 | 835.824 | 4887126 | -0.123 | -63.739 | ||
postRouteOpt | 1814274 | 248448 | 1018356 | 836.399 | 4892431 | -0.09 | -57.448 |
September 19: We updated the detailed algorithm for gridding in Circuit Training. In contrast to the open-source grid_size_selection.py in Circuit Training repo, which still calls the wrapper functions of plc client, our python scripts implement the gridding from scratch and are easy to understand. The results of our scripts match exactly that of Circuit Training.
September 21: We updated the detailed algorithm for grouping and Clustering. Here we explicitly show how the netlist information such as net model is used during grouping and clustering, while the open-source Circuit Training implementation still calls the wrapper function of the plc client to get netlist information.
Among the more notable details that were not apparent from the Nature paper or the Circuit Training repo:
September 30:
Circuit Training Baseline Result on “Our bp_quad-NanGate45_68”. We have trained CT to generate a macro placement for the bp_quad design. For this experiment we use the NanGate45 enablement; the initial canvas size is generated by setting utilization to 68%. We use the default hyperparameters used for Ariane to train CT for bp_quad design. The number of hard macros in bp_quad is 220, so we update max_sequence_length to 221 in ppo_collect.py and sequence_length to 221 in train_ppo.py.
bp_quad-NG45-68% CT result (Link to Tensorboard) (Link to corresponding CMP result) |
|||||||||
---|---|---|---|---|---|---|---|---|---|
Physical Design Stage | Core Area (um^2) |
Standard Cell Area (um^2) | Macro Area (um^2) | Total Power (mW) | Wirelength (um) |
WS (ns) |
TNS (ns) |
Congestion (H) | Congestion (V) |
postSynth | 8449457 | 1828674 | 3917822 | 1903.716 | 36067460 | 0.325 | 0 | ||
preCTS | 8449457 | 1827246 | 3917822 | 2042.610 | 35593805 | -0.015 | -0.64 | 0.12% | 0.19% |
postCTS | 8449457 | 1836549 | 3917822 | 2214.398 | 35633384 | 0 | 0 | 0.14% | 0.22% |
postRoute | 8449457 | 1836549 | 3917822 | 2197.750 | 36681437 | -0.11 | -63.817 | ||
postRouteOpt | 8449457 | 1836148 | 3917822 | 2197.478 | 36718051 | -0.003 | -0.013 |
bp_quad-NG45-68% CMP result (Link to corresponding CT result) |
|||||||||
---|---|---|---|---|---|---|---|---|---|
Physical Design Stage | Core Area (um^2) |
Standard Cell Area (um^2) | Macro Area (um^2) | Total Power (mW) | Wirelength (um) |
WS (ns) |
TNS (ns) |
Congestion (H) | Congestion (V) |
postSynth | 8449457 | 1808903 | 3917822 | 1875.440 | 20854975 | 0.327 | 0 | ||
preCTS | 8449457 | 1814511 | 3917822 | 1990.066 | 20766279 | -0.004 | -0.041 | 0.02% | 0.04% |
postCTS | 8449457 | 1824057 | 3917822 | 2160.034 | 20870489 | 0 | 0 | 0.03% | 0.05% |
postRoute | 8449457 | 1824057 | 3917822 | 2159.687 | 21535697 | -0.343 | -307.935 | ||
postRouteOpt | 8449457 | 1824031 | 3917822 | 2159.211 | 21556685 | -0.003 | -0.029 |
October 3:
We shared the Ariane133-NG45-68% protobuf netlist and clustered netlist with Google engineers. They ran training on the clustered netlist, and the following table shows the Table 1 metrics and proxy cost. Our training results resemble Google’s results.
Ariane-NG45-68%-4ns CMP result (Link to Our Result) (Link to tensorboard) | |||||||||
---|---|---|---|---|---|---|---|---|---|
Physical Design Stage | Core Area (um^2) |
Standard Cell Area (um^2) | Macro Area (um^2) | Total Power (mW) | Wirelength (um) |
WS (ns) |
TNS (ns) |
Congestion (H) | Congestion (V) |
preCTS | 1814274 | 215608 | 1018356 | 288.736 | 4260100 | -0.001 | -0.001 | 0.01% | 0.01% |
postCTS | 1814274 | 216693 | 1018356 | 302.205 | 4268402 | 0.001 | 0 | 0.02% | 0.02% |
postRoute | 1814274 | 216693 | 1018356 | 301.129 | 4377728 | 0.193 | 0 |
Cost | Ours | Google’s |
---|---|---|
Wirelength | 0.0999 | 0.1023 |
Congestion | 0.8906 | 0.9175 |
Density | 0.4896 | 0.4773 |
Proxy | 0.7900 | 0.7997 |
October 9:
Question 9. Are CT results stable? If not, how much does the outcome vary?
We see from the results in the CT repo that the outcomes of three runs with the same seed value are different. We ran six CT runs for Ariane133-NG45-68%-1.3ns design, and the following tables show the Table 1 metrics and the proxy cost details.
Metrics | Run1 | Run2 | Run3 | Run4 | Run5 | Run6 |
---|---|---|---|---|---|---|
core_area(um^2) | 1814274 | 1814274 | 1814274 | 1814274 | 1814274 | 1814274 |
macro_area(um^2) | 1018356 | 1018356 | 1018356 | 1018356 | 1018356 | 1018356 |
postSynth_std_cell_area(um^2) | 245871 | 243223 | 242695 | 243382 | 246725 | 242711 |
preCTS_std_cell_area(um^2) | 245235 | 244615 | 245921 | 243693 | 245426 | 241760 |
postCTS_std_cell_area(um^2) | 247138 | 245862 | 246186 | 246099 | 247774 | 244237 |
postRoute_std_cell_area(um^2) | 247138 | 245862 | 246186 | 246099 | 247774 | 244237 |
postRouteOpt_std_cell_area(um^2) | 247725 | 246159 | 246776 | 246498 | 248151 | 244594 |
postSynth_total_power(mw) | 757.853 | 751.37 | 755.971 | 769.154 | 760.549 | 759.477 |
preCTS_total_power(mw) | 795.381 | 791.633 | 794.2 | 793.175 | 794.542 | 790.433 |
postCTS_total_power(mw) | 837.759 | 833.972 | 833.019 | 837.791 | 837.733 | 833.350 |
postRoute_total_power(mw) | 835.807 | 832.593 | 831.162 | 836.205 | 836.124 | 831.401 |
postRouteOpt_total_power(mw) | 836.529 | 832.975 | 831.524 | 836.826 | 835.521 | 831.911 |
preCTS_wirelength(um) | 4792929 | 4495121 | 4709296 | 4673400 | 4735851 | 4902798 |
postCTS_wirelength(um) | 4833093 | 4529411 | 4749013 | 4690341 | 4777561 | 4929463 |
postRoute_wirelength(um) | 4955517 | 4649621 | 4869873 | 4816827 | 4903796 | 5054361 |
postRouteOpt_wirelength(um) | 4960472 | 4654146 | 4875070 | 4821225 | 4908694 | 5059042 |
postSynth_WS(ns) | -0.764 | -0.764 | -0.764 | -0.764 | -0.764 | -0.764 |
preCTS_WS(ns) | -0.135 | -0.104 | -0.109 | -0.1 | -0.086 | -0.091 |
postCTS_WS(ns) | -0.102 | -0.056 | -0.069 | -0.106 | -0.077 | -0.08 |
postRoute_WS(ns) | -0.134 | -0.077 | -0.102 | -0.13 | -0.106 | -0.089 |
postRouteOpt_WS(ns) | -0.133 | -0.076 | -0.105 | -0.135 | -0.081 | -0.083 |
postSynth_TNS(ns) | -366.528 | -592.301 | -501.314 | -363.351 | -405.145 | -342.59 |
preCTS_TNS(ns) | -196.114 | -136.662 | -151.307 | -122.663 | -104.413 | -98.21 |
postCTS_TNS(ns) | -76.567 | -13.883 | -40.712 | -60.272 | -27.453 | -21.711 |
postRoute_TNS(ns) | -167.965 | -58.724 | -110.496 | -133.653 | -45.42 | -44.821 |
postRouteOpt_TNS(ns) | -123.027 | -27.571 | -79.826 | -105.775 | -33.286 | -40.314 |
preCTS_Congestion (H) | 0.06% | 0.04% | 0.03% | 0.03% | 0.03% | 0.03% |
postCTS_Congestion (H) | 0.09% | 0.03% | 0.04% | 0.03% | 0.04% | 0.05% |
preCTS_Congestion (V) | 0.11% | 0.10% | 0.13% | 0.08% | 0.16% | 0.14% |
postCTS_Congestion (V) | 0.13% | 0.13% | 0.17% | 0.12% | 0.18% | 0.18% |
Wirelength cost | Congestion cost | Density cost | Proxy cost | |
---|---|---|---|---|
Run1 | 0.1052 | 0.97 | 0.5239 | 0.85215 |
Run2 | 0.1045 | 0.9417 | 0.5063 | 0.8285 |
Run3 | 0.1033 | 0.949 | 0.5193 | 0.83745 |
Run4 | 0.1034 | 0.9378 | 0.5185 | 0.8316 |
Run5 | 0.1056 | 0.9328 | 0.5418 | 0.8429 |
Run6 | 0.1104 | 0.96 | 0.5372 | 0.8590 |
Mean | 0.1054 | 0.9486 | 0.5245 | 0.8419 |
STD | 0.0026 | 0.0142 | 0.0131 | 0.0119 |
We further ran coordinate descent (CD) placer on the CT outcomes and the following tables show the Table 1 metrics and proxy cost details of the CD placer outcomes. Even though we see a significant improvement in the proxy cost, we do not see similar improvement in the Table 1 metric.
Metrics | Run1_CD | Run2_CD | Run3_CD | Run4_CD | Run5_CD | Run6_CD |
---|---|---|---|---|---|---|
core_area (um2) | 1814274 | 1814274 | 1814274 | 1814274 | 1814274 | 1814274 |
macro_area (um2) | 1018356 | 1018356 | 1018356 | 1018356 | 1018356 | 1018356 |
postSynth_std_cell_area (um2) | 243566 | 244506 | 244016 | 244368 | 242548 | 247357 |
preCTS_std_cell_area (um2) | 243267 | 241949 | 240051 | 245803 | 242336 | 245297 |
postCTS_std_cell_area (um2) | 246719 | 244046 | 241932 | 247881 | 244474 | 247763 |
postRoute_std_cell_area (um2) | 246719 | 244046 | 241932 | 247881 | 244474 | 247763 |
postRouteOpt_std_cell_area (um2) | 247000 | 243860 | 241282 | 248055 | 245020 | 248377 |
postSynth_total_power (mW) | 736.564 | 747.327 | 758.3497 | 749.487 | 752.643 | 750.437 |
preCTS_total_power (mW) | 790.601 | 788.404 | 785.7521 | 797.216 | 789.500 | 794.160 |
postCTS_total_power (mW) | 835.029 | 830.542 | 827.7217 | 839.145 | 832.896 | 836.920 |
postRoute_total_power (mW) | 833.305 | 829.015 | 825.9415 | 837.320 | 830.757 | 835.113 |
postRouteOpt_total_power (mW) | 833.109 | 828.801 | 824.8444 | 837.595 | 831.417 | 835.770 |
preCTS_wirelength (um) | 4807227 | 4481988 | 4663403 | 4645833 | 4742585 | 4813011 |
postCTS_wirelength (um) | 4830788 | 4501231 | 4680124 | 4683338 | 4779530 | 4839729 |
postRoute_wirelength (um) | 4955395 | 4621695 | 4804536 | 4809309 | 4896653 | 4965139 |
postRouteOpt_wirelength (um) | 4960842 | 4626687 | 4809650 | 4814381 | 4901760 | 4969937 |
postSynth_WS (ns) | -0.764 | -0.764 | -0.764 | -0.764 | -0.764 | -0.764 |
preCTS_WS (ns) | -0.11 | -0.092 | -0.065 | -0.115 | -0.105 | -0.143 |
postCTS_WS (ns) | -0.102 | -0.058 | -0.056 | -0.101 | -0.094 | -0.11 |
postRoute_WS (ns) | -0.135 | -0.076 | -0.088 | -0.107 | -0.11 | -0.14 |
postRouteOpt_WS (ns) | -0.129 | -0.062 | -0.055 | -0.101 | -0.109 | -0.137 |
postSynth_TNS (ns) | -351.045 | -331.782 | -406.717 | -431.986 | -450.335 | -444.635 |
preCTS_TNS (ns) | -133.192 | -90.187 | -57.052 | -152.966 | -139.133 | -196.673 |
postCTS_TNS (ns) | -55.003 | -19.074 | -8.908 | -47.75 | -52.329 | -101.123 |
postRoute_TNS (ns) | -145.14 | -31.185 | -15.033 | -82.306 | -96.749 | -157.245 |
postRouteOpt_TNS (ns) | -109.739 | -12.692 | -8.418 | -60.53 | -66.632 | -126.007 |
preCTS_Congestion (H) | 0.03% | 0.03% | 0.07% | 0.05% | 0.04% | 0.04% |
postCTS_Congestion (H) | 0.03% | 0.03% | 0.07% | 0.05% | 0.04% | 0.05% |
preCTS_Congestion (V) | 0.16% | 0.12% | 0.10% | 0.15% | 0.17% | 0.14% |
postCTS_Congestion (V) | 0.19% | 0.16% | 0.10% | 0.18% | 0.21% | 0.15% |
Wirelength cost | Congestion cost | Density cost | Proxy cost | |
---|---|---|---|---|
Run1_CD | 0.0944 | 0.7942 | 0.4927 | 0.73785 |
Run2_CD | 0.089 | 0.7829 | 0.4925 | 0.7267 |
Run3_CD | 0.0928 | 0.796 | 0.4931 | 0.73735 |
Run4_CD | 0.0957 | 0.8104 | 0.4951 | 0.7485 |
Run5_CD | 0.0909 | 0.7799 | 0.4933 | 0.7275 |
Run6_CD | 0.0922 | 0.7843 | 0.4934 | 0.7311 |
Mean | 0.0925 | 0.7913 | 0.4934 | 0.7348 |
STD | 0.0024 | 0.0114 | 0.0009 | 0.0082 |
October 15:
Question 10. What is the correlation between proxy cost and the post RouteOpt metrics?
We have collected macro placement generated by CT runs for Ariane133-NG45-68%-1.3ns that have proxy cost less than 0.9. There are ~40 such macro placements over four CT runs. From that 15 runs are chosen randomly, two runs from each bucket of proxy cost (0.9-i0.01, 0.9-(i+1)0.01] s.t. i ε [0, 6] and one run from (0.82, 0.83]. Table 1 metrics and proxy costs of these 15 runs are available in the following table.
RUN1 | RUN2 | RUN3 | RUN4 | RUN5 | RUN6 | RUN7 | RUN8 | RUN9 | RUN10 | RUN11 | RUN12 | RUN13 | RUN14 | RUN15 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
core_area (um^2) | 1814274 | 1814274 | 1814274 | 1814274 | 1814274 | 1814274 | 1814274 | 1814274 | 1814274 | 1814274 | 1814274 | 1814274 | 1814274 | 1814274 | 1814274 |
macro_area (um^2) | 1018356 | 1018356 | 1018356 | 1018356 | 1018356 | 1018356 | 1018356 | 1018356 | 1018356 | 1018356 | 1018356 | 1018356 | 1018356 | 1018356 | 1018356 |
postSynth_std_cell_area (um^2) | 242067 | 243116 | 243055 | 246488 | 243788 | 244004 | 244090 | 244844 | 245083 | 246072 | 240942 | 246725 | 242695 | 243643 | 243223 |
preCTS_std_cell_area (um^2) | 243195 | 245232 | 242421 | 244504 | 244174 | 245232 | 241542 | 246361 | 243436 | 246115 | 244612 | 245426 | 245921 | 244513 | 244615 |
postCTS_std_cell_area (um^2) | 246379 | 247012 | 243583 | 247185 | 246155 | 247948 | 244115 | 248349 | 247013 | 248156 | 246469 | 247774 | 246186 | 247138 | 245862 |
postRoute_std_cell_area (um^2) | 246379 | 247012 | 243583 | 247185 | 246155 | 247948 | 244115 | 248349 | 247013 | 248156 | 246469 | 247774 | 246186 | 247138 | 245862 |
postRouteOpt_std_cell_area (um^2) | 247121 | 247607 | 243894 | 247394 | 246878 | 248433 | 244274 | 248746 | 247320 | 248770 | 247390 | 248151 | 246776 | 247547 | 246159 |
postSynth_total_power (mw) | 769.520 | 753.509 | 742.910 | 752.287 | 752.254 | 741.871 | 756.514 | 753.901 | 753.265 | 749.084 | 750.949 | 760.549 | 755.971 | 753.220 | 751.370 |
preCTS_total_power (mw) | 791.074 | 793.708 | 787.915 | 792.428 | 791.913 | 792.947 | 787.022 | 791.689 | 790.387 | 795.202 | 791.286 | 794.542 | 794.200 | 791.590 | 791.633 |
postCTS_total_power (mw) | 834.752 | 836.171 | 829.367 | 834.354 | 833.401 | 836.912 | 830.593 | 835.061 | 831.509 | 833.914 | 832.950 | 837.733 | 833.019 | 835.334 | 833.972 |
postRoute_total_power (mw) | 833.184 | 834.695 | 828.029 | 833.086 | 831.875 | 835.325 | 828.821 | 833.941 | 830.484 | 832.671 | 831.772 | 836.124 | 831.162 | 833.983 | 832.593 |
postRouteOpt_total_power (mw) | 833.961 | 835.436 | 828.254 | 833.318 | 832.649 | 835.803 | 829.066 | 834.304 | 831.652 | 833.287 | 832.768 | 835.521 | 831.524 | 834.484 | 832.975 |
preCTS_wirelength (um) | 4728745 | 4717333 | 4642346 | 4628632 | 4659824 | 4873402 | 4882098 | 4543637 | 4649807 | 4709934 | 4486281 | 4735851 | 4709296 | 4585732 | 4495121 |
postCTS_wirelength (um) | 4762085 | 4757761 | 4674012 | 4665159 | 4693884 | 4912764 | 4918705 | 4585918 | 4677979 | 4742407 | 4522423 | 4777561 | 4749013 | 4616680 | 4529411 |
postRoute_wirelength (um) | 4885433 | 4888249 | 4797431 | 4795134 | 4817647 | 5042041 | 5043542 | 4716210 | 4807107 | 4869741 | 4650492 | 4903796 | 4869873 | 4742247 | 4649621 |
postRouteOpt_wirelength (um) | 4890958 | 4893245 | 4802406 | 4800104 | 4822688 | 5047120 | 5048498 | 4720614 | 4811606 | 4874840 | 4655745 | 4908694 | 4875070 | 4746909 | 4654146 |
Wirelength_Cost | 0.1042 | 0.1011 | 0.1032 | 0.1014 | 0.1032 | 0.1055 | 0.1064 | 0.1027 | 0.1048 | 0.1027 | 0.1023 | 0.1056 | 0.1033 | 0.1053 | 0.1045 |
postSynth_WS (ns) | -0.764 | -0.764 | -0.764 | -0.79 | -0.764 | -0.764 | -0.79 | -0.764 | -0.764 | -0.764 | -0.764 | -0.764 | -0.764 | -0.764 | -0.764 |
preCTS_WS (ns) | -0.114 | -0.101 | -0.08 | -0.096 | -0.116 | -0.101 | -0.066 | -0.121 | -0.117 | -0.137 | -0.124 | -0.086 | -0.109 | -0.125 | -0.104 |
postCTS_WS (ns) | -0.088 | -0.08 | -0.036 | -0.066 | -0.098 | -0.076 | -0.021 | -0.098 | -0.096 | -0.053 | -0.104 | -0.077 | -0.069 | -0.109 | -0.056 |
postRoute_WS (ns) | -0.121 | -0.094 | -0.072 | -0.341 | -0.118 | -0.087 | -0.088 | -0.118 | -0.123 | -0.134 | -0.137 | -0.106 | -0.102 | -0.13 | -0.077 |
postRouteOpt_WS (ns) | -0.125 | -0.096 | -0.063 | -0.066 | -0.089 | -0.087 | -0.041 | -0.119 | -0.13 | -0.099 | -0.126 | -0.081 | -0.105 | -0.134 | -0.076 |
postSynth_TNS (ns) | -326.535 | -382.684 | -477.484 | -339.098 | -401.614 | -414.822 | -367.119 | -412.85 | -422.819 | -350.771 | -313.919 | -405.145 | -501.314 | -366.866 | -592.301 |
preCTS_TNS (ns) | -147.905 | -129.089 | -92.977 | -111.456 | -141.654 | -116.344 | -62.661 | -171.687 | -156.067 | -206.043 | -169.834 | -104.413 | -151.307 | -168.846 | -136.662 |
postCTS_TNS (ns) | -69.386 | -67.761 | -4.902 | -34.67 | -60.302 | -41.497 | -2.514 | -83.036 | -62.184 | -27.629 | -122.576 | -27.453 | -40.712 | -55.55 | -13.883 |
postRoute_TNS (ns) | -172.018 | -85.027 | -48.269 | -37.909 | -85.811 | -70.604 | -15.213 | -129.351 | -128.868 | -143.568 | -199.374 | -45.42 | -110.496 | -132.265 | -58.724 |
postRouteOpt_TNS (ns) | -135.838 | -70.139 | -25.199 | -33.755 | -68.666 | -47.43 | -14.211 | -118.13 | -96.63 | -105.577 | -152.772 | -33.286 | -79.826 | -94.025 | -27.571 |
preCTS_Congestion (H) | 0.04% | 0.03% | 0.04% | 0.03% | 0.02% | 0.05% | 0.03% | 0.02% | 0.03% | 0.05% | 0.04% | 0.03% | 0.03% | 0.02% | 0.04% |
postCTS_Congestion (H) | 0.05% | 0.04% | 0.05% | 0.06% | 0.04% | 0.05% | 0.04% | 0.05% | 0.04% | 0.04% | 0.06% | 0.04% | 0.04% | 0.03% | 0.03% |
preCTS_Congestion (V) | 0.17% | 0.16% | 0.11% | 0.14% | 0.16% | 0.11% | 0.16% | 0.13% | 0.15% | 0.12% | 0.14% | 0.16% | 0.13% | 0.11% | 0.10% |
postCTS_Congestion (V) | 0.16% | 0.14% | 0.13% | 0.13% | 0.15% | 0.12% | 0.16% | 0.14% | 0.18% | 0.13% | 0.15% | 0.18% | 0.17% | 0.14% | 0.13% |
Congestion_Cost | 1.0192 | 0.9983 | 1.0115 | 1.0062 | 0.9894 | 1.006 | 0.9813 | 0.9966 | 0.9932 | 0.9587 | 0.9672 | 0.9328 | 0.949 | 0.9439 | 0.9417 |
Wirelength_Cost | 0.1042 | 0.1011 | 0.1032 | 0.1014 | 0.1032 | 0.1055 | 0.1064 | 0.1027 | 0.1048 | 0.1027 | 0.1023 | 0.1056 | 0.1033 | 0.1053 | 0.1045 |
Congestion_Cost | 1.0192 | 0.9983 | 1.0115 | 1.0062 | 0.9894 | 1.006 | 0.9813 | 0.9966 | 0.9932 | 0.9587 | 0.9672 | 0.9328 | 0.949 | 0.9439 | 0.9417 |
Density_Cost | 0.5622 | 0.5923 | 0.5543 | 0.5622 | 0.5523 | 0.5354 | 0.5409 | 0.53 | 0.5113 | 0.5439 | 0.5215 | 0.5418 | 0.5193 | 0.5136 | 0.5063 |
Proxy_Cost | 0.8949 | 0.8964 | 0.8861 | 0.8856 | 0.87405 | 0.8762 | 0.8675 | 0.866 | 0.85705 | 0.854 | 0.84665 | 0.8429 | 0.83745 | 0.83405 | 0.8285 |
In the following table we report the Kendall rank correlation coefficient for proxy costs and postPlaceOpt metrics and for proxy costs and postRouteOpt metrics. Here values near +1, -1 indicate high correlation or anti-correlation and values near 0 indicate high miscorrelation.
Correlation between PostPlaceOpt metrics and proxy cost | |||||||
---|---|---|---|---|---|---|---|
Cost | Std Cell Area | Wirelength | Total Power | Worst Slack | TNS | Congestion (V) | Congestion (H) |
Wirelength | -0.09662 | 0.33655 | -0.12501 | 0.32851 | 0.29809 | -0.06098 | 0.00000 |
Congestion | -0.30622 | 0.10476 | -0.23810 | 0.17225 | 0.14286 | 0.18118 | 0.13093 |
Density | -0.08654 | 0.21053 | 0.15311 | 0.24038 | 0.19139 | 0.35399 | 0.03289 |
Proxy | -0.22967 | 0.23810 | -0.06667 | 0.28708 | 0.23810 | 0.32210 | 0.06547 |
Correlation between PostRouteOpt metrics and proxy cost | |||||
---|---|---|---|---|---|
Cost | Std Cell Area | Wirelength | Total Power | Worst Slack | TNS |
Wirelength | -0.22116 | 0.31732 | -0.14424 | 0.16347 | 0.31732 |
Congestion | -0.02857 | 0.08571 | -0.00952 | 0.10476 | -0.04762 |
Density | 0.09569 | 0.22967 | 0.09569 | 0.26795 | 0.07656 |
Proxy | -0.00952 | 0.25714 | 0.04762 | 0.20000 | 0.04762 |
Circuit Training Baseline Result on “Our MemPool_Group-NanGate45_68”.
We have trained CT to generate a macro placement for the MemPool Group design. For this experiment we use the NanGate45 enablement; the initial canvas size is generated by setting utilization to 68%. We use the default hyperparameters used for Ariane to train CT for bp_quad design. The number of hard macros in MemPool Group is 324, so we update max_sequence_length to 325 in ppo_collect.py and sequence_length to 325 in train_ppo.py.
MemPool group-NG45-68%-4ns CT result (Flow2. Final DRC Count: 19367) (Link to Tensorboard) |
|||||||||
---|---|---|---|---|---|---|---|---|---|
Physical Design Stage | Core Area (um^2) |
Standard Cell Area (um^2) | Macro Area (um^2) | Total Power (mW) | Wirelength (um) |
WS (ns) |
TNS (ns) |
Congestion (H) | Congestion (V) |
postSynth | 11371934 | 4976373 | 3078071 | 3149.187 | 113753318 | 0 | 0 | ||
preCTS | 11371934 | 4916168 | 3078071 | 2528.429 | 113557846 | -0.033 | -42.949 | 3.03% | 1.51% |
postCTS | 11371934 | 4867885 | 3078071 | 2707.906 | 113908550 | -0.001 | -0.018 | 3.55% | 1.76% |
postRoute | 11371934 | 4867885 | 3078071 | 2742.635 | 123398335 | -0.749 | -13254.6 | ||
postRouteOpt | 11371934 | 4861749 | 3078071 | 2742.982 | 123578279 | -0.206 | -26.811 |
MemPool group-NG45-68%-4ns CMP result (Flow2. Final DRC Count: 26) |
|||||||||
---|---|---|---|---|---|---|---|---|---|
Physical Design Stage | Core Area (um^2) |
Standard Cell Area (um^2) | Macro Area (um^2) | Total Power (mW) | Wirelength (um) |
WS (ns) |
TNS (ns) |
Congestion (H) | Congestion (V) |
postSynth | 11371934 | 4947251 | 3078071 | 2938.815 | 94419498 | 0 | 0 | ||
preCTS | 11371934 | 4891095 | 3078071 | 2402.835 | 96594902 | -0.018 | -150.478 | 1.72% | 0.78% |
postCTS | 11371934 | 4846216 | 3078071 | 2584.086 | 97108227 | -0.003 | -0.043 | 1.85% | 0.87% |
postRoute | 11371934 | 4846216 | 3078071 | 2589.973 | 102792205 | -0.241 | -4400.6 | ||
postRouteOpt | 11371934 | 4837150 | 3078071 | 2586.602 | 102907484 | -0.02 | -1.029 |
November 25:
We document two variant Evaluation Flows (taking macro placements through Innovus place-and-route) that we use, in this Evaluation Flow document. Posted results up to now have been obtained with Evaluation Flow 2. The Evaluation Flow document shows that results and conclusions are nearly identical between Evaluation Flow 1 and Evaluation Flow 2. However, going forward we will report our macro placement assessments using Evaluation Flow 1.
CT Results with a Commercial (GLOBALFOUNDRIES 12nm) Design Enablement
We have run CT to generate macro placements for Ariane133, BlackParrot and MemPool Group designs on GLOBALFOUNDRIES 12nm (GF12) enablement. The following tables present the normalized design metrics. Core area, standard cell area and macro area are normalized with respect to the core area. Total power is normalized with respect to the reported preCTS total power when CMP is used. Similarly, we normalize the wirelength and congestion based on the reported preCTS wirelength and congestion when CMP is used. The timing numbers are normalized with respect to the target clock period.
Ariane133-GF12-68% CMP (results are normalized as described here ) |
|||||||||
---|---|---|---|---|---|---|---|---|---|
Physical Design Stage | Core Area | Standard Cell Area | Macro Area | Total Power | Wirelength | WS | TNS | Congestion (H) |
Congestion (V) |
preCTS | 1 | 0.137 | 0.555 | 1.0000 | 1.0000 | -0.130 | -259.985 | 0.00 | 1.00 |
postCTS | 1 | 0.139 | 0.555 | 1.1442 | 1.0112 | -0.145 | -114.783 | 0.00 | 1.00 |
postRoute | 1 | 0.139 | 0.555 | 1.1356 | 1.0432 | -0.185 | -142.688 | ||
postRouteOpt | 1 | 0.139 | 0.555 | 1.1352 | 1.0443 | -0.159 | -142.274 |
Ariane133-GF12-68% CT (results are normalized as described here) (Link to Tensorboard) |
|||||||||
---|---|---|---|---|---|---|---|---|---|
Physical Design Stage | Core Area | Standard Cell Area | Macro Area | Total Power | Wirelength | WS | TNS | Congestion (H) | Congestion (V) |
preCTS | 1 | 0.138 | 0.555 | 1.0120 | 1.1652 | -0.130 | -239.531 | 0.00 | 0.50 |
postCTS | 1 | 0.140 | 0.555 | 1.1623 | 1.1828 | -0.138 | -140.220 | 0.00 | 1.00 |
postRoute | 1 | 0.140 | 0.555 | 1.1530 | 1.2151 | -0.138 | -145.883 | ||
postRouteOpt | 1 | 0.140 | 0.555 | 1.1519 | 1.2161 | -0.145 | -115.805 |
Ariane-GF12-68% AutoDMP (results are normalized as described here) |
|||||||||
---|---|---|---|---|---|---|---|---|---|
Physical Design Stage | Core Area | Standard Cell Area | Macro Area | Total Power | Wirelength | WS | TNS | Congestion (H) | Congestion (V) |
preCTS | 1 | 0.136 | 0.555 | 0.9941 | 1.0214 | -0.116 | -204.181 | 0.00 | 0.50 |
postCTS | 1 | 0.138 | 0.555 | 1.1406 | 1.0337 | -0.126 | -114.774 | 0.00 | 1.00 |
postRoute | 1 | 0.138 | 0.555 | 1.1318 | 1.0670 | -0.180 | -187.204 | ||
postRouteOpt | 1 | 0.137 | 0.555 | 1.1296 | 1.0681 | -0.130 | -90.493 |
Ariane133-GF12-68% Hier-RTLMP (results are normalized as described here) |
|||||||||
---|---|---|---|---|---|---|---|---|---|
Physical Design Stage | Core Area (um^2) | Standard Cell Area (um^2) | Macro Area (um^2) | Total Power (mW) | Wirelength (um) | WS (ns) | TNS (ns) | Congestion (H) | Congestion (V) |
preCTS | 1 | 0.138 | 0.555 | 1.0218 | 1.3219 | -0.144 | -307.690 | 0.00 | 3.5 |
postCTS | 1 | 0.140 | 0.555 | 1.1657 | 1.3389 | -0.169 | -190.458 | 0.00 | 3.5 |
postRoute | 1 | 0.140 | 0.555 | 1.1557 | 1.3772 | -0.270 | -289.089 | ||
postRouteOpt | 1 | 0.139 | 0.555 | 1.1541 | 1.3785 | -0.181 | -178.470 |
BlackParrot-GF12-68% CMP (results are normalized as described here) |
|||||||||
---|---|---|---|---|---|---|---|---|---|
Physical Design Stage | Core Area | Standard Cell Area | Macro Area | Total Power | Wirelength | WS | TNS | Congestion(H) | Congestion(V) |
preCTS | 1 | 0.176 | 0.501 | 1.0000 | 1.0000 | 0.001 | 0.000 | 1.00 | 1.00 |
postCTS | 1 | 0.178 | 0.501 | 1.1526 | 1.0079 | 0.000 | 0.000 | 1.00 | 1.00 |
postRoute | 1 | 0.178 | 0.501 | 1.1436 | 1.0304 | -0.014 | -2.629 | ||
postRouteOpt | 1 | 0.178 | 0.501 | 1.1437 | 1.0306 | 0.001 | 0.000 |
BlackParrot-GF12-68% CT [results are normalized as described here] (Link to Tensorboard) |
|||||||||
---|---|---|---|---|---|---|---|---|---|
Physical Design Stage | Core Area | Standard Cell Area | Macro Area | Total Power | Wirelength | WS | TNS | Congestion(H) | Congestion(V) |
preCTS | 1 | 0.178 | 0.501 | 1.1068 | 1.6993 | 0.001 | 0.000 | 3.00 | 2.00 |
postCTS | 1 | 0.179 | 0.501 | 1.2621 | 1.7058 | 0.000 | 0.000 | 2.00 | 2.20 |
postRoute | 1 | 0.179 | 0.501 | 1.2469 | 1.7372 | -0.028 | -11.492 | ||
postRouteOpt | 1 | 0.179 | 0.501 | 1.2462 | 1.7379 | 0.001 | 0.000 |
BlackParrot-GF12-68% AutoDMP [results are normalized as described here] | |||||||||
---|---|---|---|---|---|---|---|---|---|
Physical Design Stage | Core Area | Standard Cell Area | Macro Area | Total Power | Wirelength | WS | TNS | Congestion (H) | Congestion (V) |
preCTS | 1 | 0.176 | 0.501 | 1.0012 | 0.9891 | 0.001 | 0.000 | 1.0 | 1.0 |
postCTS | 1 | 0.178 | 0.501 | 1.1519 | 0.9967 | 0.000 | 0.000 | 1.0 | 1.2 |
postRoute | 1 | 0.178 | 0.501 | 1.1433 | 1.0199 | -0.045 | -12.419 | ||
postRouteOpt | 1 | 0.178 | 0.501 | 1.1433 | 1.0202 | 0.000 | 0.000 |
MemPool Group-GF12-68% CMP [results are normalized as described here ] |
|||||||||
---|---|---|---|---|---|---|---|---|---|
Physical Design Stage | Core Area | Standard Cell Area | Macro Area | Total Power | Wirelength | WS | TNS | Congestion(H) | Congestion(V) |
preCTS | 1 | 0.415 | 0.308 | 1.0000 | 1.0000 | -0.154 | -12479.05 | 1.00 | 1.00 |
postCTS | 1 | 0.406 | 0.308 | 1.0663 | 1.0109 | -0.134 | -1828.60 | 1.07 | 1.26 |
postRoute | 1 | 0.406 | 0.308 | 1.0631 | 1.0507 | -0.213 | -5882.00 | ||
postRouteOpt | 1 | 0.405 | 0.308 | 1.0601 | 1.0521 | -0.197 | -1961.25 |
MemPool Group-GF12-68% CT [results are normalized as described here ] (Link to Tensorboard) |
|||||||||
---|---|---|---|---|---|---|---|---|---|
Physical Design Stage | Core Area | Standard Cell Area | Macro Area | Total Power | Wirelength | WS | TNS | Congestion(H) | Congestion(V) |
preCTS | 1 | 0.419 | 0.308 | 1.1094 | 1.222 | -0.170 | -13620.25 | 1 | 1.22 |
postCTS | 1 | 0.414 | 0.308 | 1.1966 | 1.2331 | -0.179 | -3615.65 | 1.27 | 1.57 |
postRoute | 1 | 0.414 | 0.308 | 1.1987 | 1.2798 | -0.178 | -6350.95 | ||
postRouteOpt | 1 | 0.410 | 0.308 | 1.1847 | 1.282 | -0.195 | -1849.40 |
MemPool Group-GF12-68% human macro placement [results are normalized as described here] |
|||||||||
---|---|---|---|---|---|---|---|---|---|
Physical Design Stage | Core Area | Standard Cell Area | Macro Area | Total Power | Wirelength | WS | TNS | Congestion (H) | Congestion (V) |
preCTS | 1 | 0.418 | 0.308 | 1.033 | 1.084 | -0.157 | -12888.500 | 0.73 | 1.09 |
postCTS | 1 | 0.409 | 0.308 | 1.105 | 1.093 | -0.142 | -2663.800 | 0.80 | 1.30 |
postRoute | 1 | 0.409 | 0.308 | 1.103 | 1.136 | -0.200 | -4989.700 | ||
postRouteOpt | 1 | 0.406 | 0.308 | 1.091 | 1.138 | -0.149 | -1766.450 |
(Updated on May 1, 2023)
BlackParot-GF12-68% Innovus CMP [results are normalized as described here] |
|||||||||
---|---|---|---|---|---|---|---|---|---|
Physical Design Stage | Core Area (um^2) | Standard Cell Area (um^2) | Macro Area (um^2) | Total Power (mW) | Wirelength (um) | WS (ns) | TNS (ns) | Congestion (H) | Congestion (V) |
preCTS | 1 | 0.188 | 0.498 | 1.000 | 1.000 | -0.099 | -230.148 | 1.00 | 1.00 |
postCTS | 1 | 0.190 | 0.498 | 1.148 | 1.009 | -0.080 | -93.367 | 1.00 | 1.00 |
postRoute | 1 | 0.190 | 0.498 | 1.138 | 1.033 | -0.171 | -1033.653 | ||
postRouteOpt | 1 | 0.190 | 0.498 | 1.138 | 1.034 | -0.087 | -138.918 |
BlackParrot-GF12-68% CT (wirelength cost: 0.0756, congestion cost: 0.7329, density cost: 0.6526, proxy cost: 0.7684) (Link to tensorboard) |
|||||||||
---|---|---|---|---|---|---|---|---|---|
Physical Design Stage | Core Area (um^2) | Standard Cell Area (um^2) | Macro Area (um^2) | Total Power (mW) | Wirelength (um) | WS (ns) | TNS (ns) | Congestion (H) | Congestion (V) |
preCTS | 1 | 0.190 | 0.498 | 1.083 | 1.568 | -0.108 | -244.624 | 2.00 | 1.80 |
postCTS | 1 | 0.192 | 0.498 | 1.238 | 1.572 | -0.087 | -115.327 | 2.00 | 2.00 |
postRoute | 1 | 0.192 | 0.498 | 1.223 | 1.605 | -0.209 | -270.951 | ||
postRouteOpt | 1 | 0.191 | 0.498 | 1.219 | 1.606 | -0.089 | -66.473 |
BlackParrot-GF12-68% SA (wirelength cost: 0.0576, congestion cost: 0.6619, density cost: 0.5971, proxy cost: 0.6871) [results are normalized as described here] |
|||||||||
---|---|---|---|---|---|---|---|---|---|
Physical Design Stage | Core Area (um^2) | Standard Cell Area (um^2) | Macro Area (um^2) | Total Power (mW) | Wirelength (um) | WS (ns) | TNS (ns) | Congestion (H) | Congestion (V) |
preCTS | 1 | 0.189 | 0.498 | 1.030 | 1.239 | -0.119 | -234.785 | 1.00 | 1.40 |
postCTS | 1 | 0.191 | 0.498 | 1.183 | 1.246 | -0.111 | -159.242 | 1.00 | 1.80 |
postRoute | 1 | 0.191 | 0.498 | 1.171 | 1.274 | -0.296 | -4161.765 | ||
postRouteOpt | 1 | 0.191 | 0.498 | 1.175 | 1.275 | -0.160 | -325.995 |
BlackParot-GF12-68% Human Expert [results are normalized as described here] |
|||||||||
---|---|---|---|---|---|---|---|---|---|
Physical Design Stage | Core Area (um^2) | Standard Cell Area (um^2) | Macro Area (um^2) | Total Power (mW) | Wirelength (um) | WS (ns) | TNS (ns) | Congestion (H) | Congestion (V) |
preCTS | 1 | 0.189 | 0.498 | 1.010 | 1.065 | -0.107 | -264.618 | 1.00 | 2.60 |
postCTS | 1 | 0.190 | 0.498 | 1.157 | 1.074 | -0.048 | -40.525 | 2.00 | 3.20 |
postRoute | 1 | 0.190 | 0.498 | 1.148 | 1.106 | -0.266 | -340.181 | ||
postRouteOpt | 1 | 0.189 | 0.498 | 1.144 | 1.107 | -0.049 | -15.400 |
BlackParot-GF12-68% AutoDMP [results are normalized as described here] |
|||||||||
---|---|---|---|---|---|---|---|---|---|
Physical Design Stage | Core Area (um^2) | Standard Cell Area (um^2) | Macro Area (um^2) | Total Power (mW) | Wirelength (um) | WS (ns) | TNS (ns) | Congestion (H) | Congestion (V) |
preCTS | 1 | 0.189 | 0.498 | 1.005 | 1.008 | -0.136 | -254.904 | 1.00 | 1.00 |
postCTS | 1 | 0.191 | 0.498 | 1.153 | 1.017 | -0.076 | -99.649 | 1.00 | 1.20 |
postRoute | 1 | 0.191 | 0.498 | 1.143 | 1.043 | -0.253 | -361.892 | ||
postRouteOpt | 1 | 0.190 | 0.498 | 1.140 | 1.043 | -0.062 | -61.772 |
BlackParrot-GF12-68% Hier-RTLMP [results are normalized as described here] |
|||||||||
---|---|---|---|---|---|---|---|---|---|
Physical Design Stage | Core Area (um^2) | Standard Cell Area (um^2) | Macro Area (um^2) | Total Power (mW) | Wirelength (um) | WS (ns) | TNS (ns) | Congestion (H) | Congestion (V) |
preCTS | 1 | 0.188 | 0.498 | 1.035 | 1.249 | -0.100 | -214.208 | 2.00 | 1.60 |
postCTS | 1 | 0.190 | 0.498 | 1.188 | 1.257 | -0.079 | -102.866 | 1.00 | 1.80 |
postRoute | 1 | 0.190 | 0.498 | 1.177 | 1.288 | -0.213 | -339.322 | ||
postRouteOpt | 1 | 0.190 | 0.498 | 1.173 | 1.289 | -0.082 | -54.313 |
MemPool Group-GF12-68% Innovus CMP [results are normalized as described here] |
|||||||||
---|---|---|---|---|---|---|---|---|---|
Physical Design Stage | Core Area (um^2) | Standard Cell Area (um^2) | Macro Area (um^2) | Total Power (mW) | Wirelength (um) | WS (ns) | TNS (ns) | Congestion (H) | Congestion (V) |
preCTS | 1 | 0.412 | 0.312 | 1.000 | 1.000 | -0.073 | -4486.957 | 1.00 | 1.00 |
postCTS | 1 | 0.403 | 0.312 | 1.056 | 1.007 | -0.058 | -196.767 | 1.00 | 1.00 |
postRoute | 1 | 0.403 | 0.312 | 1.055 | 1.048 | -0.126 | -2495.000 | ||
postRouteOpt | 1 | 0.393 | 0.312 | 1.025 | 1.051 | -0.101 | -167.530 |
MemPool Group-GF12-68% CT (Wirelength cost: 0.069, Congestion cost: 0.810, Density Cost: 1.039, Proxy Cost: 0.994) (Link to tensorboard) [results are normalized as described here] |
|||||||||
---|---|---|---|---|---|---|---|---|---|
Physical Design Stage | Core Area (um^2) | Standard Cell Area (um^2) | Macro Area (um^2) | Total Power (mW) | Wirelength (um) | WS (ns) | TNS (ns) | Congestion (H) | Congestion (V) |
preCTS | 1 | 0.416 | 0.312 | 1.085 | 1.189 | -0.085 | -5086.783 | 0.76 | 1.25 |
postCTS | 1 | 0.409 | 0.312 | 1.153 | 1.196 | -0.090 | -578.565 | 0.73 | 1.33 |
postRoute | 1 | 0.409 | 0.312 | 1.154 | 1.244 | -0.196 | -5010.696 | ||
postRouteOpt | 1 | 0.400 | 0.312 | 1.124 | 1.247 | -0.087 | -124.331 |
MemPool Group-GF12-68% SA (Wirelength cost: 0.064, Congestion cost: 0.940, Density Cost: 1.325, Proxy Cost: 1.196) [results are normalized as described here] |
|||||||||
---|---|---|---|---|---|---|---|---|---|
Physical Design Stage | Core Area (um^2) | Standard Cell Area (um^2) | Macro Area (um^2) | Total Power (mW) | Wirelength (um) | WS (ns) | TNS (ns) | Congestion (H) | Congestion (V) |
preCTS | 1 | 0.415 | 0.312 | 1.081 | 1.187 | -0.083 | -5070.000 | 1.29 | 1.42 |
postCTS | 1 | 0.408 | 0.312 | 1.138 | 1.197 | -0.094 | -415.182 | 1.32 | 1.52 |
postRoute | 1 | 0.408 | 0.312 | 1.145 | 1.248 | -0.149 | -4161.478 | ||
postRouteOpt | 1 | 0.403 | 0.312 | 1.130 | 1.250 | -0.077 | -262.988 |
MemPool Group-GF12-68% Human Expert [results are normalized as described here] |
|||||||||
---|---|---|---|---|---|---|---|---|---|
Physical Design Stage | Core Area (um^2) | Standard Cell Area (um^2) | Macro Area (um^2) | Total Power (mW) | Wirelength (um) | WS (ns) | TNS (ns) | Congestion (H) | Congestion (V) |
preCTS | 1 | 0.414 | 0.312 | 1.027 | 1.065 | -0.081 | -4820.478 | 0.48 | 1.00 |
postCTS | 1 | 0.407 | 0.312 | 1.092 | 1.070 | -0.062 | -357.957 | 0.55 | 1.04 |
postRoute | 1 | 0.407 | 0.312 | 1.091 | 1.113 | -0.142 | -3350.652 | ||
postRouteOpt | 1 | 0.398 | 0.312 | 1.059 | 1.116 | -0.075 | -105.913 |
MemPool Group-GF12-68% AutoDMP [results are normalized as described here] |
|||||||||
---|---|---|---|---|---|---|---|---|---|
Physical Design Stage | Core Area (um^2) | Standard Cell Area (um^2) | Macro Area (um^2) | Total Power (mW) | Wirelength (um) | WS (ns) | TNS (ns) | Congestion (H) | Congestion (V) |
preCTS | 1 | 0.415 | 0.312 | 1.015 | 1.037 | -0.105 | -5260.304 | 1.00 | 1.13 |
postCTS | 1 | 0.407 | 0.312 | 1.078 | 1.044 | -0.104 | -517.435 | 1.00 | 1.22 |
postRoute | 1 | 0.407 | 0.312 | 1.077 | 1.089 | -0.116 | -3304.174 | ||
postRouteOpt | 1 | 0.400 | 0.312 | 1.054 | 1.091 | -0.103 | -267.739 |
MemPool Group-GF12-68% Hier-RTLMP [results are normalized as described here] |
|||||||||
---|---|---|---|---|---|---|---|---|---|
Physical Design Stage | Core Area (um^2) | Standard Cell Area (um^2) | Macro Area (um^2) | Total Power (mW) | Wirelength (um) | WS (ns) | TNS (ns) | Congestion (H) | Congestion (V) |
preCTS | 1 | 0.411 | 0.312 | 1.031 | 1.086 | -0.076 | -4525.696 | 0.62 | 0.92 |
postCTS | 1 | 0.405 | 0.312 | 1.100 | 1.095 | -0.072 | -394.957 | 0.68 | 1.04 |
postRoute | 1 | 0.405 | 0.312 | 1.101 | 1.138 | -0.139 | -3301.739 | ||
postRouteOpt | 1 | 0.397 | 0.312 | 1.074 | 1.140 | -0.068 | -94.348 |
An Observation regarding “Pure Commercial Flow”.
The Evaluation Flow document also sheds light on the relative
strength of a “Pure Commercial Flow”, as follows. CT uses the placement information generated by physical synthesis (Genus iSpatial). Observe that if we go straight into Evaluation Flow 1 from physical synthesis (without running CT), this will produce a “pure commercial flow” (i.e., CMP) outcome without any use of
Circuit Training. From the data in the Evaluation Flow document,
we see that with the “pure commercial flow”, CMP macro placements produce similar timing and power numbers compared to CT macro placements. However, the postRouteOpt wirelength of CT macro placements is at least 18% larger than the postRouteOpt wirelength of CMP macro placements.
Please note that we report this data as part of our study of Circuit Training. It is not intended to “benchmark” any commercial EDA tool in any sense, and the data should not be interpreted as providing any sort of “benchmarking” comparison or value judgment regarding the commercial tool.
November 27:
We have extended the experiment of Question 3 to assess the difficulty of our testcases. As mentioned here, we take the CT-generated macro placement and then randomly swap the same-size macros. We use the shuffle_macro.tcl script for this experiment. The following items provide details of the macro shuffling experiments for different testcases.
Ariane133-NG45-68%-1.3ns |
|||||||
---|---|---|---|---|---|---|---|
Metrics | CT | Shuffle-111 | Shuffle-222 | Shuffle-333 | Shuffle-444 | Shuffle-555 | Shuffle-666 |
Core_area (um^2) | 1814274 | 1814274 | 1814274 | 1814274 | 1814274 | 1814274 | 1814274 |
Macro_area (um^2) | 1018356 | 1018356 | 1018356 | 1018356 | 1018356 | 1018356 | 1018356 |
preCTS_std_cell_area (um^2) | 243264 | 246309 | 243426 | 246181 | 247134 | 243731 | 246412 |
postRouteOpt_std_cell_area (um^2) | 244002 | 250080 | 246325 | 249506 | 249494 | 246242 | 247918 |
preCTS_total_power (mw) | 789.871 | 802.369 | 796.562 | 803.034 | 801.677 | 794.323 | 802.673 |
postRouteOpt_total_power (mw) | 828.747 | 845.726 | 836.735 | 844.61 | 843.227 | 837.434 | 838.833 |
preCTS_wirelength (um) | 4727728 | 5515599 | 5547501 | 5489654 | 5508653 | 5448399 | 5549232 |
postRouteOpt_wirelength (um) | 4893776 | 5690000 | 5712986 | 5667587 | 5687840 | 5628320 | 5724530 |
preCTS_WS (ns) | -0.091 | -0.112 | -0.109 | -0.141 | -0.144 | -0.095 | -0.151 |
postRouteOpt_WS (ns) | -0.079 | -0.091 | -0.099 | -0.106 | -0.157 | -0.048 | -0.108 |
preCTS_TNS (ns) | -110.373 | -136.145 | -136.781 | -197.545 | -196.557 | -96.462 | -210.187 |
postRouteOpt_TNS (ns) | -25.762 | -66.855 | -86.119 | -81.177 | -159.035 | -16.386 | -75.133 |
preCTS_Congestion (H) | 0.03% | 0.04% | 0.05% | 0.05% | 0.04% | 0.04% | 0.05% |
preCTS_Congestion (V) | 0.12% | 0.12% | 0.15% | 0.12% | 0.12% | 0.10% | 0.10% |
Runtime (second) | 3451 | 3786 | 3427 | 3591 | 3748 | 3851 | 3994 |
BlackParrot (Quad-Core)-NG45-68%-1.3ns (bp_clk) |
|||||||
---|---|---|---|---|---|---|---|
Metrics | CT | Shuffle-111 | Shuffle-222 | Shuffle-333 | Shuffle-444 | Shuffle-555 | Shuffle-666 |
core_area (um^2) | 8449457 | 8449457 | 8449457 | 8449457 | 8449457 | 8449457 | 8449457 |
macro_area (um^2) | 3917822 | 3917822 | 3917822 | 3917822 | 3917822 | 3917822 | 3917822 |
preCTS_std_cell_area (um^2) | 1954954 | 1985365 | 1986378 | 1985226 | 1984435 | 1988719 | 1991871 |
postRouteOpt_std_cell_area (um^2) | 1978731 | 2008143 | 2037502 | 2033273 | 2014517 | 2027724 | 2016049 |
preCTS_total_power (mw) | 4329.795 | 4604.961 | 4619.481 | 4608.242 | 4591.569 | 4632.783 | 4620.598 |
postRouteOpt_total_power (mw) | 4685.509 | 4959.629 | 5004.988 | 4998.899 | 4959.435 | 5005.635 | 4977.157 |
preCTS_wirelength (um) | 39101445 | 51131110 | 51444279 | 52030185 | 52035717 | 53176682 | 51997133 |
postRouteOpt_wirelength (um) | 40467467 | 53098209 | 53425737 | 54070974 | 54030437 | 55365255 | 54171082 |
preCTS_WS (ns) | -0.220 | -0.228 | -0.193 | -0.205 | -0.199 | -0.217 | -0.222 |
postRouteOpt_WS (ns) | -0.260 | -0.179 | -0.305 | -0.342 | -0.211 | -0.289 | -0.251 |
preCTS_TNS (ns) | -1385.900 | -1105.900 | -826.103 | -912.903 | -1116.400 | -944.540 | -1065.400 |
postRouteOpt_TNS (ns) | -3657.000 | -835.927 | -6542.400 | -8738.100 | -1816.000 | -3548.600 | -1322.200 |
preCTS_Congestion (H) | 0.21% | 0.52% | 0.71% | 0.64% | 0.62% | 0.53% | 0.66% |
preCTS_Congestion (V) | 0.29% | 0.54% | 0.44% | 0.50% | 0.45% | 0.68% | 0.57% |
Runtime (second) | 22367 | 26089 | 25940 | 25293 | 24745 | 32431 | 31591 |
December 20:
We thank NVIDIA Research for access to AutoDMP, an autotuned DREAMPlace-based macro placer that will be reported at ISPD-2023. We have generated macro placements of Ariane and BlackParrot using AutoDMP, in both NG45 and GF12 enablements. The results are as follows:
Ariane133-NG45-68%-1.3ns AutoDMP (Link to CT result) (Link to CMP result) | |||||||||
---|---|---|---|---|---|---|---|---|---|
Physical Design Stage | Core Area (um^2) | Standard Cell Area (um^2) | Macro Area (um^2) | Total Power (mW) | Wirelength (um) | WS (ns) | TNS (ns) | Congestion (H) | Congestion (V) |
preCTS | 1814274 | 243431 | 1018356 | 783.810 | 3604121 | -0.105 | -140.503 | 0.00% | 0.01% |
postCTS | 1814274 | 243612 | 1018356 | 821.621 | 3630937 | -0.097 | -47.167 | 0.03% | 0.15% |
postRoute | 1814274 | 243612 | 1018356 | 821.558 | 3759529 | -0.102 | -75.677 | ||
postRouteOpt | 1814274 | 243720 | 1018356 | 821.654 | 3763817 | -0.095 | -37.496 |
Ariane133-GF12-68%: Link to AutoDMP macro placement details of Ariane on GF12 enablement.
BlackParrot-NG45-68%-(bp clock)1.3ns: Following table and screenshots show the macro placement result of BlackParrot (Quad-Core) on NG45, generated using AutoDMP.
BlackParrot Quad-Core-NG45-68%-1.3ns AutoDMP (Link to CT result) (Link to CMP result) | |||||||||
---|---|---|---|---|---|---|---|---|---|
Physical Design Stage | Core Area (um^2) | Standard Cell Area (um^2) | Macro Area (um^2) | Total Power (mW) | Wirelength (um) | WS (ns) | TNS (ns) | Congestion (H) | Congestion (V) |
preCTS | 8449457 | 1903521 | 3917822 | 4069.801 | 22483473 | -0.183 | -584.774 | 0.02% | 0.07% |
postCTS | 8449457 | 1916465 | 3917822 | 4438.356 | 22616243 | -0.145 | -288.267 | 0.05% | 0.09% |
postRoute | 8449457 | 1916465 | 3917822 | 4434.782 | 23349968 | -0.195 | -2164.900 | ||
postRouteOpt | 8449457 | 1920024 | 3917822 | 4438.571 | 23376406 | -0.190 | -1183.100 |
December 21:
Question 11. How does the initial placement generated by different physical synthesis tools affect the CT solution?
We observe that whether the initial placement solution is generated using Flow-2 (CMP-Genus iSpatial) or the initial placement is generated by DC-Topo (links to scripts), the final CT outcomes are similar.
The following table and screenshots provide details of Ariane133-NG45-68%-1.3ns CT macro placement when DC-Topo is used to generate the initial placement solution.
Ariane133-NG45-68%-1.3ns CT result when the initial placement information is generated by Synopsys DC-Topo physical synthesis. | |||||||||
---|---|---|---|---|---|---|---|---|---|
Physical Design Stage | Core Area (um^2) | Standard Cell Area (um^2) | Macro Area (um^2) | Total Power (mW) | Wirelength (um) | WS (ns) | TNS (ns) | Congestion (H) | Congestion (V) |
preCTS | 1814274 | 284197 | 1018356 | 815.500 | 4544323 | -0.155 | -261.254 | 0.02% | 0.17% |
postCTS | 1814274 | 286795 | 1018356 | 858.088 | 4599954 | -0.146 | -118.845 | 0.02% | 0.20% |
postRoute | 1814274 | 286795 | 1018356 | 857.217 | 4705640 | -0.203 | -302.019 | ||
postRouteOpt | 1814274 | 287151 | 1018356 | 857.755 | 4710065 | -0.206 | -255.818 |
Link to result of Ariane133-NG45-68%-1.3ns CT macro placement when Flow-2 (CMP-Genus iSpatial physical synthesis) is used to generate the initial placement information.
Question 12. How well does Simulated Annealing (SA) optimize the proxy cost?
Details of our SA implementation, which we denote as SA-UCSD, are here. We have used SA-UCSD to generate macro placements for Ariane and BlackParrot (Quad-Core). We find that SA-UCSD produces better proxy costs than CT.
Ariane133-NG45-68%-1.3ns SA-UCSD result (Link to CT result) (Link to CMP result) | |||||||||
---|---|---|---|---|---|---|---|---|---|
Physical Design Stage | Core Area (um^2) | Standard Cell Area (um^2) | Macro Area (um^2) | Total Power (mW) | Wirelength (um) | WS (ns) | TNS (ns) | Congestion (H) | Congestion (V) |
preCTS | 1814274 | 243604 | 1018356 | 786.182 | 3825529 | -0.130 | -187.073 | 0.01% | 0.03% |
postCTS | 1814274 | 245443 | 1018356 | 827.698 | 3868208 | -0.099 | -52.565 | 0.02% | 0.06% |
postRoute | 1814274 | 245443 | 1018356 | 827.546 | 3982401 | -0.125 | -114.924 | ||
postRouteOpt | 1814274 | 245804 | 1018356 | 828.053 | 3986262 | -0.112 | -75.338 |
BlackParrot Quad-Core-NG45-68%-(bp clock)1.3ns SA-UCSD (Link to CT result) (Link to CMP result) | |||||||||
---|---|---|---|---|---|---|---|---|---|
Physical Design Stage | Core Area (um^2) | Standard Cell Area (um^2) | Macro Area (um^2) | Total Power (mW) | Wirelength (um) | WS (ns) | TNS (ns) | Congestion (H) | Congestion (V) |
preCTS | 8449457 | 1921810 | 3917822 | 4185.031 | 30470310 | -0.209 | -863.535 | 0.08% | 0.32% |
postCTS | 8449457 | 1934844 | 3917822 | 4560.519 | 30568687 | -0.107 | -267.191 | 0.09% | 0.36% |
postRoute | 8449457 | 1934844 | 3917822 | 4539.416 | 31510301 | -0.239 | -6022.700 | ||
postRouteOpt | 8449457 | 1943841 | 3917822 | 4547.886 | 31550599 | -0.222 | -3263.800 |
Question 13. How good are human macro placements relative to Circuit Training?
We observe that human macro placements can achieve smaller wirelength than CT, with similar timing and power numbers. Details of human macro placements for BlackParrot (Quad-Core) and MemPool Group on NG45 enablement are as follows:
BlackParrot Quad-Core-NG45-68%-1.3ns Human macro placement (not a gridded placement) (Link to CT result) (Link to CMP result) | |||||||||
---|---|---|---|---|---|---|---|---|---|
Physical Design Stage | Core Area (um^2) | Standard Cell Area (um^2) | Macro Area (um^2) | Total Power (mW) | Wirelength (um) | WS (ns) | TNS (ns) | Congestion (H) | Congestion (V) |
preCTS | 8449457 | 1907164 | 3917822 | 4107.931 | 24814112 | -0.195 | -530.552 | 0.08% | 0.12% |
postCTS | 8449457 | 1918983 | 3917822 | 4475.523 | 24944903 | -0.097 | -209.587 | 0.09% | 0.13% |
postRoute | 8449457 | 1918983 | 3917822 | 4468.904 | 25888999 | -0.120 | -454.561 | ||
postRouteOpt | 8449457 | 1919928 | 3917822 | 4469.552 | 25915520 | -0.097 | -321.918 |
MemPool Group-NG45-68%-4ns human macro placement (not a gridded placement) (Link to CT result) (Link to CMP result) | |||||||||
---|---|---|---|---|---|---|---|---|---|
Physical Design Stage | Core Area (um^2) | Standard Cell Area (um^2) | Macro Area (um^2) | Total Power (mW) | Wirelength (um) | WS (ns) | TNS (ns) | Congestion (H) | Congestion (V) |
preCTS | 11371934 | 4930345 | 3078071 | 2459.392 | 101645170 | -0.021 | -141.801 | 0.39% | 0.86% |
postCTS | 11371934 | 4883741 | 3078071 | 2640.242 | 102110339 | -0.003 | -0.055 | 0.58% | 0.96% |
postRoute | 11371934 | 4883741 | 3078071 | 2642.017 | 107463344 | -0.246 | -2941.400 | ||
postRouteOpt | 11371934 | 4873872 | 3078071 | 2639.916 | 107597894 | -0.049 | -11.897 |
We have also added
March 5:
Question 14. What is the impact on CT results when DREAMPlace is used instead of force-directed placement?
We have integrated DREAMPlace in Circuit Training (commit hash: 91e14fd1caa5b15d9bb1b58b6d5e47042ab244f3) and trained CT to generate macro placement solutions for Ariane, BlackParrot and MemPool Group designs. We referer to CT with DREAMPlace as CT+DREAMPlace and CT with FD as CT+FD. The training results are as follows:
Ariane133-NG45-68%-1.3ns CT+DREAMPlace result (Link to tensorboard) (Link to CT+FD result) |
|||||||||
---|---|---|---|---|---|---|---|---|---|
Physical Design Stage | Core Area (um^2) | Standard Cell Area (um^2) | Macro Area (um^2) | Total Power (mW) | Wirelength (um) | WS (ns) | TNS (ns) | Congestion (H) | Congestion (V) |
preCTS | 1814274 | 244313 | 1018356 | 791.482 | 4669338 | -0.135 | -176.306 | 0.05% | 0.12% |
postCTS | 1814274 | 244976 | 1018356 | 830.645 | 4693972 | -0.106 | -75.708 | 0.05% | 0.15% |
postRoute | 1814274 | 244976 | 1018356 | 828.923 | 4822561 | -0.124 | -109.91 | ||
postRouteOpt | 1814274 | 245438 | 1018356 | 829.353 | 4827641 | -0.126 | -93.752 |
BP(Quad-Core)-NG45-68%-1.3ns CT+DREAMPlace (Link to tensor board) (Link to CT+FD result) |
|||||||||
---|---|---|---|---|---|---|---|---|---|
Physical Design Stage | Core Area (um^2) | Standard Cell Area (um^2) | Macro Area (um^2) | Total Power (mW) | Wirelength (um) | WS (ns) | TNS (ns) | Congestion (H) | Congestion (V) |
preCTS | 8449457 | 1959789 | 3917822 | 4396.086 | 42267061 | -0.209 | -1132.2 | 0.28% | 0.57% |
postCTS | 8449457 | 1978100 | 3917822 | 4783.785 | 42346079 | -0.163 | -680.8 | 0.29% | 0.63% |
postRoute | 8449457 | 1978100 | 3917822 | 4751.075 | 43883402 | -0.201 | -1406.3 | ||
postRouteOpt | 8449457 | 1979794 | 3917822 | 4753.696 | 43931174 | -0.178 | -850.8 |
MemPool Group-NG45-68%-4ns CT+DREAMPlace (Link to tensorboard) (Link to CT+FD Result) |
|||||||||
---|---|---|---|---|---|---|---|---|---|
Physical Design Stage | Core Area (um^2) | Standard Cell Area (um^2) | Macro Area (um^2) | Total Power (mW) | Wirelength (um) | WS (ns) | TNS (ns) | Congestion (H) | Congestion (V) |
preCTS | 11371934 | 4990302 | 3078071 | 2659.403 | 121635791 | -0.015 | -71.824 | 3.33% | 3.26% |
postCTS | 11371934 | 4969651 | 3078071 | 2839.139 | 122062712 | -0.004 | -0.104 | 3.49% | 3.19% |
postRoute | 11371934 | 4969651 | 3078071 | 2893.588 | 132078512 | -1.137 | -29243.4 | ||
postRouteOpt | 11371934 | 4995348 | 3078071 | 2908.959 | 132299696 | -0.072 | -97.892 |
Question 15. Should we factor in density cost while using DREAMPlace for CT?
We update the density weight from 0.5 to 0.0, then rerun CT-DREAMPlace for Ariane, BlackParrot and MemPool Group designs. The training results are as follows:
Ariane133-NG45-68%-1.3ns CT+DREAMPlace result (Density Weight = 0.0) (Link to tensorboard) (Link to CT+FD result) |
|||||||||
---|---|---|---|---|---|---|---|---|---|
Physical Design Stage | Core Area (um^2) | Standard Cell Area (um^2) | Macro Area (um^2) | Total Power (mW) | Wirelength (um) | WS (ns) | TNS (ns) | Congestion (H) | Congestion (V) |
preCTS | 1814274 | 245097 | 1018356 | 793.171 | 4959656 | -0.137 | -202.147 | 0.04% | 0.17% |
postCTS | 1814274 | 248172 | 1018356 | 839.062 | 4993255 | -0.117 | -108.074 | 0.04% | 0.15% |
postRoute | 1814274 | 248172 | 1018356 | 836.985 | 5114089 | -0.164 | -243.834 | ||
postRouteOpt | 1814274 | 248775 | 1018356 | 837.655 | 5119513 | -0.16 | -152.043 |
BP(Quad-Core)-NG45-68%-1.3ns CT+DREAMPlace (Density weight = 0.0) (Link to tensorboard) (Link to CT+FD result) |
|||||||||
---|---|---|---|---|---|---|---|---|---|
Physical Design Stage | Core Area (um^2) | Standard Cell Area (um^2) | Macro Area (um^2) | Total Power (mW) | Wirelength (um) | WS (ns) | TNS (ns) | Congestion (H) | Congestion (V) |
preCTS | 8449457 | 1947589 | 3917822 | 4323.518 | 38208933 | -0.233 | -1177.6 | 0.33% | 0.46% |
postCTS | 8449457 | 1961564 | 3917822 | 4703.800 | 38314312 | -0.153 | -468.3 | 0.37% | 0.49% |
postRoute | 8449457 | 1961564 | 3917822 | 4674.250 | 39753854 | -0.200 | -1995.5 | ||
postRouteOpt | 8449457 | 1964239 | 3917822 | 4677.048 | 39800843 | -0.180 | -809.0 |
MemPool Group-NG45-68%-4ns CT+DREAMPlace (Density weight = 0.0) (Link to tensorboard) (Link to CT+FD Result) |
|||||||||
---|---|---|---|---|---|---|---|---|---|
Physical Design Stage | Core Area (um^2) | Standard Cell Area (um^2) | Macro Area (um^2) | Total Power (mW) | Wirelength (um) | WS (ns) | TNS (ns) | Congestion (H) | Congestion (V) |
preCTS | 11371934 | 4934839 | 3078071 | 2613.613 | 119923841 | -0.027 | -146.5 | 2.56% | 2.51% |
postCTS | 11371934 | 4928559 | 3078071 | 2802.851 | 120508367 | -0.003 | -0.1 | 2.87% | 2.66% |
postRoute | 11371934 | 4928559 | 3078071 | 2848.873 | 130024068 | -0.803 | -19920.7 | ||
postRouteOpt | 11371934 | 4953483 | 3078071 | 2858.071 | 130243153 | -0.050 | -33.5 |
We observe from the above results that CT+DREAMPlace achieves similar result for density weight 0 and 0.5.
Question 16. Why does your study (and, ISPD-2023 paper) use Cadence CMP 21.1, which was not available to Google engineers when they wrote the Nature paper?
We used Innovus version 21.1 since it was the latest version of our place-and-route evaluator of macro placement solutions. CMP 21.1 is part of Innovus 21.1. Using the latest version of CMP was also natural, given our starting assumption that RL from Nature would outperform the commercial state-of-the-art.
We have now run further experiments using older versions of CMP and Innovus. We find that the macro placements produced by CMP across versions 19.1, 20.1 and 21.1 lead to the same qualitative conclusions. Additional details:
Below are screenshots of Ariane-NG45-68%-1.3ns for (in order, top-down) CMP + P&R outcomes in Innovus 19.1, 20.1 and 21.1 versions.
Question 17. What are the outcomes of CT when the training is continued until convergence?
To put this question in perspective, training “until convergence” is not described in any of the guidelines provided by the CT GitHub repo for reproducing the results in the Nature paper. For the ISPD 2023 paper, we adhere to the guidelines given in the CT GitHub repo, use the same number of iterations for Ariane as Google engineers demonstrate in the CT GitHub repo, and obtain results that closely align with Google’s outcomes for Ariane. (See FAQs #4 and #13.)
We run CT training for an extended number (=600) of iterations, for each of Ariane, BlackParrot and MemPool Group on NG45, and make the following observations.
Our new data from using triple the CT training budget indicate that training until convergence, compared to the configurations explored in the ISPD-2023 paper, improves proxy cost but does not significantly improve chip metrics on Ariane and MemPool Group. Among chip metrics for BlackParrot, routed wirelength improves significantly while other metrics are similar to what we previously reported. Overall, training until convergence does not qualitatively change comparisons to results of Simulated Annealing and human macro placements reported in the ISPD 2023 paper.
The subsequent tables and figures present the Nature Table 1 metrics of Ariane and BlackParrot on NG45, for macro placement solutions generated by CT training until convergence. (For MemPool Group, using triple the default number of CT iterations did not change the final proxy cost.)
Ariane133-NG45-68%-1.3ns CT result (Link to tensorboard) |
|||||||||
---|---|---|---|---|---|---|---|---|---|
Physical Design Stage | Core Area (um^2) | Standard Cell Area (um^2) | Macro Area (um^2) | Total Power (mW) | Wirelength (um) | WS (ns) | TNS (ns) | Congestion(H) | Congestion (V) |
preCTS | 1814274 | 242539 | 1018356 | 787.798 | 4577259 | -0.095 | -121.911 | 0.04% | 0.11% |
postCTS | 1814274 | 244220 | 1018356 | 830.273 | 4610696 | -0.07 | -41.635 | 0.05% | 0.13% |
postRoute | 1814274 | 244220 | 1018356 | 828.935 | 4734768 | -0.095 | -90.160 | ||
postRouteOpt | 1814274 | 244666 | 1018356 | 829.419 | 4739136 | -0.085 | -62.685 |
BlackParrot (Quad-Core)-NG45-68%-1.3ns CT result (Link to tensorboard) |
|||||||||
---|---|---|---|---|---|---|---|---|---|
Physical Design Stage | Core Area (um^2) | Standard Cell Area (um^2) | Macro Area (um^2) | Total Power (mW) | Wirelength (um) | WS (ns) | TNS (ns) | Congestion (H) | Congestion (V) |
preCTS | 8449457 | 1922798 | 3917822 | 4185.939 | 29820259 | -0.179 | -648.911 | 0.10% | 0.26% |
postCTS | 8449457 | 1935706 | 3917822 | 4563.875 | 29956480 | -0.138 | -355.347 | 0.12% | 0.28% |
postRoute | 8449457 | 1935706 | 3917822 | 4542.299 | 30893195 | -0.188 | -2280.100 | ||
postRouteOpt | 8449457 | 1940957 | 3917822 | 4547.832 | 30928844 | -0.199 | -1263.400 |
Question 18. To study the benefit that CT derives from use of a commercial placement solution, why do you compare with giving CT “impossible” initial placements, where all instances are placed at the same location?
Procedure gen_perturbed_placement
Input: seed, x
# x indicates the fraction of instances to be moved 0 < x < 1.0
1. For w, h in {unique list of instance (width, height)}
a. instance_list = {list of instances with width = w and height = h}
b. instance_list = shuffle(instance_list, seed)
c. instance_count = length(shuffled_instance_list)
d. shuffled_instance_list = instance_list[:instance_count*x]
e. shuffle_placement(shuffled_instance_list, seed)
Procedure shuffle_placement
Input: instance_list, seed
1. X, Y, Orient = {list of lower left coordinate and orientation of instances in the instance_list}
2. shuffled_instance_list = shuffle(instance_list, seed)
3. For i in range(length(instance_list)):
a. Update location and orientation of shuffled_instance_list[i] with (X[i], Y[i]) and Orient[i]
April 27, 2023:
We have run Hier-RTLMP macro placer, as described in the arXiv paper, on our modern benchmarks. The code for Hier-RTLMP is open-sourced here. We use the default settings to generate the macro placement solutions. The results are as follows:
Ariane133-NG45-68%-1.3ns Hier-RTLMP (Link to CT result) (Link to CMP result) |
|||||||||
---|---|---|---|---|---|---|---|---|---|
Physical Design Stage | Core Area (um^2) | Standard Cell Area (um^2) | Macro Area (um^2) | Total Power (mW) | Wirelength (um) | WS (ns) | TNS (ns) | Congestion (H) | Congestion (V) |
preCTS | 1814274 | 246916 | 1018356 | 796.781 | 5087055 | -0.149 | -192.7 | 0.11% | 0.08% |
postCTS | 1814274 | 247403 | 1018356 | 836.595 | 5136058 | -0.110 | -104.2 | 0.15% | 0.10% |
postRoute | 1814274 | 247403 | 1018356 | 835.096 | 5291106 | -0.178 | -356.0 | ||
postRouteOpt | 1814274 | 248296 | 1018356 | 836.002 | 5296879 | -0.165 | -223.4 |
Ariane133-GF12-68%: Link to the HierRTLMP macro placement details of Ariane on GF12 enablement.
BlackParrot (Quad-Core)-NG45-68%-1.3ns: Following table and screenshots show the macro placement result of BlackParrot (Quad-Core) on NG45, generated using Hier-RTLMP.
BlackParrot-NG45-68%-1.3ns Hier-RTLMP (Link to CT result) (Link to CMP result) |
|||||||||
---|---|---|---|---|---|---|---|---|---|
Physical Design Stage | Core Area (um^2) | Standard Cell Area (um^2) | Macro Area (um^2) | Total Power (mW) | Wirelength (um) | WS (ns) | TNS (ns) | Congestion (H) | Congestion (V) |
preCTS | 8449457 | 1908372 | 3917822 | 4148.534 | 27687847 | -0.169 | -455.5 | 0.13% | 0.17% |
postCTS | 8449457 | 1923367 | 3917822 | 4522.966 | 27810361 | -0.123 | -181.5 | 0.15% | 0.20% |
postRoute | 8449457 | 1923367 | 3917822 | 4509.596 | 28835670 | -0.166 | -906.8 | ||
postRouteOpt | 8449457 | 1925012 | 3917822 | 4511.780 | 28865504 | -0.150 | -456.6 |
BlackParrot (Quad-Core)-GF12-68%: Link to the HierRTLMP macro placement details of BlackParrot (Quad-Core) on GF12 enablement.
MemPool Group-NG45-68%-4ns: Following table and screenshots show the macro placement result of MemPool Group on NG45, generated using Hier-RTLMP.
MemPool Group-NG45-68%-4ns Hier-RTLMP (62 DRCs) (Link to CT result) (Link to CMP result) |
|||||||||
---|---|---|---|---|---|---|---|---|---|
Physical Design Stage | Core Area (um^2) | Standard Cell Area (um^2) | Macro Area (um^2) | Total Power (mW) | Wirelength (um) | WS (ns) | TNS (ns) | Congestion (H) | Congestion (V) |
preCTS | 11371934 | 4939447 | 3078071 | 2489.1 | 105739299 | -0.016 | -50.5 | 2.05% | 1.03% |
postCTS | 11371934 | 4895581 | 3078071 | 2671.4 | 106267958 | -0.002 | -0.1 | 2.31% | 1.18% |
postRoute | 11371934 | 4895581 | 3078071 | 2696.2 | 113924593 | -0.503 | -4743.7 | ||
postRouteOpt | 11371934 | 4889459 | 3078071 | 2695.3 | 114073113 | -0.062 | -4.9 |
Protobuf to LEF/DEF and macro placement of CT-Ariane
We have released a new Protobuf-to-LEF/DEF translator in our repository;
detailed information is available in
CodeElements/FormatTranslators.
Using this translator, we have generated LEF/DEF files from the
Protobuf netlist
of the Ariane design (the only publicly available design disclosed by the
authors of the Nature paper) available in the Circuit Training repository.
We believe that, consistent with the sub-10nm characterization of testcases
mentioned in the Nature paper, CT-Ariane corresponds to an implementation in
TSMC 7nm technology. This belief is based on two aspects of the Protobuf netlist
posted by Google Brain. (1) First, in the Protobuf header, we see
“ariane_tsmc7_dc_09162019”, which suggests that the design is in the TSMC 7nm
node. (2) Second, we find here that in TSMC 7nm technology, the standard-cell
height is either 240nm or 300nm. All single-height standard cells in the
CT-Ariane Protobuf posted by Google Brain have a height of 240nm (i.e., “HD”).
The cell naming seen in Google’s posted Ariane testcase (e.g.,
“NR2D1BWP240H8P57PDSVT”) matches conventions commonly seen with TSMC-based
design enablement.
With these generated LEF/DEF files, we have created macro placement solutions using Circuit Training (CT), RePlAce, and Innovus Concurrent Macro Placer (CMP). To evaluate these macro placement solutions, we use Innovus21.1. The evaluation flow is as follows: (1) we first legalize macro placement solutions using the refine_macro_place command; (2) we then place standard cells using the place_design command; and (3) finally, we report post-placement HPWL.
The figure below shows visualizations of the macro placement solutions generated by Circuit Training (commit hash: 1e14fd1ca), RePlAce (OpenROAD, commit hash: ad808fd, command: global_placement -density 0.8) and Innovus CMP (version: 21.1, command: place_design -concurrent_macros) for the CT-Ariane (original, “X1”) Protobuf. The corresponding LEF/DEF files are here. Please note that we report this data as part of our study of Circuit Training. It is not intended to “benchmark” any commercial EDA tool in any sense, and the data should not be interpreted as providing any sort of “benchmarking” comparison or value judgment regarding the commercial tool.
We have scaled the Protobuf netlist of the Ariane design in the Circuit Training repository into CT-Ariane-X2 and CT-Ariane-X4, following the “quantified suboptimality” studies in the DAC-1995 paper, “Quantified suboptimality of VLSI layout heuristics”. For a given testcase, self-scaling of additional copies can be performed in two basic ways: shift and flip.
By combining these actions, it is possible to obtain variants of the X2 design using X-Shift (the second copy is placed to the right of the original copy), Y-Shift (the second copy is placed above the original copy), X-Flip (the second copy mirrors the original copy about the X axis), and Y-Flip (the second copy mirrors the original copy about the Y axis). Variants for the X4 design can be obtained by serial application of these actions, e.g., X-Shift-Y-Shift, X-Flip-Y-Flip, X-Shift-Y-Flip, X-Flip-Y-Shift, etc. However, considering that all I/O pins must be placed at the boundaries, two variants are of more interest for CT-Ariane-X4: X-Shift-Y-Flip and X-Flip-Y-Flip.
Our naming convention is as follows: CT-Ariane-X4-X-Shift-Y-Flip indicates a design that is an X4 version of the original CT-Ariane design. It is generated by first shifting the X1 copy along the X-axis to obtain an X2 copy, then flipping the X2 copy along the Y-axis to create the X4 copy. For the CT-Ariane-X2, we generate two versions: CT-Ariane-X2-Y-Flip and CT-Ariane-X2-X-Shift. For the CT-Ariane-X4, we generate two versions: CT-Ariane–X4-X-Shift-Y-Flip and CT-Ariane-X4-X-Flip-Y-Flip.
The following figures show visualizations of the macro placement solutions for each version, generated using RePlAce (OpenROAD, commit hash: ad808fd) and Innovus CMP (version 21.1). HPWL and runtime values are also shown. The detailed command and evaluation flow are the same as those used for the original CT-Ariane (X1) study.
X2 Versions: (CT-Ariane-X2-Y-Flip)
X2 Versions: (CT-Ariane-X2-X-Shift)
X4 Versions: (CT-Ariane-X4-X-Shift-Y-Flip)
X4 Versions: (CT-Ariane-X4-X-Flip-Y-Flip)