System Name System 9
System Availability Available
System Category Datacenter · On-premise
System Size 8x Accelerator 9
Model Name Deepseek-R1
Division Closed
Model Precision FP4
Model Link —
Transformation Link —
Model Notes —
Dataset Name MLPerf-Deepseek
Dataset Type Performance
Average Input Tokens 803.791249
Average Output Tokens 3886.227438
Dataset Link —
Measured Accuracy Score —
Throughput vs Interactivity
Throughput vs Concurrency
Time to First Token vs Concurrency
Interactivity vs Concurrency
Processor
Processor Model Name Processor 9
Processors per Node 2
Cores Per Processor 32
VCPUs Per Processor —
Accelerator
Accelerator Model Name Accelerator 9
Accelerators per Node 8
Memory Type
Memory Capacity 256 GB
Accelerator Interconnect
Host-Accelerator Interconnect
Host / Storage
Host Memory Capacity 1TB
Memory Configuration
Storage Capacity
Storage Type
Cooling Mix of Liquid-cooled and Air-cooled
Hardware Notes
Framework vLLM
Operating System Linux
Other Software Inference Backend v1.0
Software Notes —
| Field Name | Run 1 | Run 2 | Run 3 | Run 4 | Run 5 | Run 6 | Run 7 | Run 8 | Run 9 | Run 10 |
|---|---|---|---|---|---|---|---|---|---|---|
| Run Date | 03/03/2026 | 03/03/2026 | 03/04/2026 | 03/03/2026 | 03/04/2026 | 03/04/2026 | 03/04/2026 | 03/03/2026 | 03/03/2026 | 03/03/2026 |
| Concurrency | 0.33 | 0.67 | 1.33 | 2.67 | 5.34 | 10.67 | 21.34 | 42.68 | 85.37 | 170.74 |
| System Tokens/Second | 69.6 | 136.9 | 265.7 | 451.3 | 720.6 | 1,106.6 | 1,428.2 | 1,799.4 | 2,616.1 | 3,908.3 |
| Tokens/Second per User | 208.8 | 205.2 | 199.2 | 169.2 | 135.1 | 103.7 | 66.9 | 42.2 | 30.6 | 22.9 |
| TTFT P99 (ms) | 31690.0 | 34557.9 | 46810.5 | 83235.3 | 114692.6 | 183958.3 | 373950.1 | 683704.3 | 1617735.8 | 3976554.3 |
| Utilization | 1.8% | 3.5% | 6.8% | 11.5% | 18.4% | 28.3% | 36.5% | 46.0% | 66.9% | 100.0% |
| Configuration Summary | ||||||||||
| QPS | 0.0180 | 0.0351 | 0.0685 | 0.1169 | 0.1847 | 0.2830 | 0.3689 | 0.4643 | 0.6978 | 1.0439 |
| Total Output Tokens | 33,926,283 | 34,185,331 | 34,063,493 | 33,878,035 | 34,233,018 | 34,315,466 | 33,975,939 | 34,013,009 | 1,564,302,823 | 1,562,330,261 |
| Run Duration (s) | 487,131.77 | 249,740.11 | 128,199.99 | 75,069.41 | 47,503.24 | 31,008.42 | 23,789.71 | 18,902.80 | 597,957.60 | 399,743.86 |
| Total Requests | 8,776 | 8,776 | 8,776 | 8,776 | 8,776 | 8,776 | 8,776 | 8,776 | 417,284 | 417,284 |
| Time To First Token (TTFT) (ms) | ||||||||||
| Minimum | 16908.4 | 16446.6 | 16051.2 | 16500.1 | 16861.9 | 17515.2 | 18957.4 | 18828.1 | 17868.7 | 18221.3 |
| Average | 23042.4 | 22761.7 | 23520.6 | 27358.3 | 34300.5 | 47154.7 | 105117.0 | 481121.4 | 72898.8 | 180980.9 |
| P50 | 23062.8 | 22164.1 | 23536.3 | 23758.6 | 29659.3 | 33960.5 | 46612.2 | 494980.0 | 38157.4 | 41020.1 |
| P90 | 29233.3 | 28406.9 | 29546.9 | 41516.0 | 49482.4 | 103672.5 | 320195.7 | 638910.0 | 57004.6 | 60797.8 |
| P95 | 30297.5 | 29498.9 | 32057.2 | 48508.3 | 54637.0 | 169103.2 | 364212.3 | 649183.6 | 61579.9 | 818481.9 |
| P99 | 31690.0 | 34557.9 | 46810.5 | 83235.3 | 114692.6 | 183958.3 | 373950.1 | 683704.3 | 1617735.8 | 3976554.3 |
| P999 | 55608.5 | 64147.4 | 64552.4 | 90778.4 | 118624.2 | 185595.1 | 376104.6 | 687289.9 | 2657904.2 | 5127347.1 |
| Maximum | 59621.4 | 78513.6 | 64629.0 | 91840.5 | 132851.1 | 186095.5 | 376276.7 | 687934.4 | 2724616.7 | 5273260.1 |
| Time Per Output Token (TPOT) (ms) | ||||||||||
| Minimum | 328.8 | 330.7 | 324.6 | 371.1 | 415.8 | 441.9 | 536.8 | 0.3 | 1553.2 | 1549.9 |
| Average | 464.2 | 464.3 | 467.5 | 519.7 | 598.4 | 677.5 | 795.9 | 969.3 | 2953.9 | 3756.7 |
| P50 | 457.9 | 458.6 | 461.0 | 512.1 | 590.5 | 680.5 | 787.2 | 945.4 | 2807.4 | 3473.8 |
| P90 | 518.0 | 517.0 | 521.1 | 581.8 | 680.5 | 813.8 | 946.1 | 1140.2 | 3281.0 | 4196.5 |
| P95 | 527.6 | 526.6 | 531.4 | 593.9 | 694.5 | 839.2 | 1000.0 | 1201.2 | 3309.1 | 4212.8 |
| P99 | 545.4 | 547.2 | 549.1 | 611.3 | 718.3 | 872.1 | 1058.6 | 1330.0 | 3360.8 | 4386.0 |
| P999 | 589.6 | 582.5 | 571.4 | 631.9 | 742.8 | 898.7 | 1104.4 | 1709.0 | 3462.2 | 7230.4 |
| Maximum | 1021.6 | 680.0 | 593.1 | 651.0 | 771.7 | 937.3 | 1128.1 | 2743.4 | 4101.3 | 6011164.9 |
| Request Latency (ms) | ||||||||||
| Minimum | 108103.0 | 72085.7 | 104029.8 | 117250.6 | 115243.5 | 107222.4 | 180759.4 | 452474.4 | 96303.3 | 96270.3 |
| Average | 1758207.3 | 1775732.7 | 1775959.5 | 1967478.1 | 2298948.9 | 2647940.0 | 3139613.8 | 3973346.9 | 11142016.7 | 13837973.8 |
| P50 | 955930.8 | 964988.9 | 984024.8 | 1091820.6 | 1234917.4 | 1417601.8 | 1724662.1 | 2500465.1 | 6244277.9 | 7724731.0 |
| P90 | 4246460.2 | 4355810.6 | 4283481.9 | 4722249.4 | 5586557.9 | 6607260.0 | 7704190.1 | 9035883.1 | 27686948.6 | 34374763.7 |
| P95 | 6026635.1 | 6163371.8 | 6116105.5 | 6822359.6 | 8085507.6 | 9539689.8 | 10972685.7 | 12103531.7 | 39601395.5 | 49288401.1 |
| P99 | 8668077.6 | 8796495.6 | 8798065.5 | 9798341.4 | 11411893.1 | 13159384.5 | 15090929.5 | 16254763.6 | 56751873.4 | 70982046.5 |
| P999 | 9213522.2 | 9445701.2 | 9296461.4 | 10300842.2 | 12092719.6 | 14531844.7 | 16174286.1 | 17314006.2 | 65677558.8 | 83819234.7 |
| Maximum | 9747750.3 | 10011402.7 | 9399291.9 | 10658160.6 | 12490970.6 | 15242298.5 | 16515959.6 | 17759264.6 | 66959603.9 | 85670529.2 |