System Name System 10
System Availability Available
System Category Datacenter · On-premise
System Size 8x Accelerator 10
Model Name Deepseek-R1
Division Closed
Model Precision FP4
Model Link —
Transformation Link —
Model Notes —
Dataset Name MLPerf-Deepseek
Dataset Type Performance
Average Input Tokens 803.791249
Average Output Tokens 3886.227438
Dataset Link —
Measured Accuracy Score —
Throughput vs Interactivity
Throughput vs Concurrency
Time to First Token vs Concurrency
Interactivity vs Concurrency
Processor
Processor Model Name Processor 10
Processors per Node 2
Cores Per Processor 32
VCPUs Per Processor —
Accelerator
Accelerator Model Name Accelerator 10
Accelerators per Node 8
Memory Type
Memory Capacity 256 GB
Accelerator Interconnect
Host-Accelerator Interconnect
Host / Storage
Host Memory Capacity 1TB
Memory Configuration
Storage Capacity
Storage Type
Cooling Air-cooled
Hardware Notes
Framework vLLM
Operating System Linux
Other Software Inference Backend v1.0
Software Notes —
| Field Name | Run 1 | Run 2 | Run 3 | Run 4 | Run 5 | Run 6 | Run 7 | Run 8 | Run 9 | Run 10 |
|---|---|---|---|---|---|---|---|---|---|---|
| Run Date | 03/03/2026 | 03/02/2026 | 03/03/2026 | 03/03/2026 | 03/03/2026 | 02/27/2026 | 02/27/2026 | 02/27/2026 | 02/27/2026 | 02/27/2026 |
| Concurrency | 0.01 | 0.02 | 0.04 | 0.08 | 0.17 | 0.33 | 0.67 | 1.33 | 2.67 | 5.34 |
| System Tokens/Second | 1.0 | 1.7 | 2.9 | 4.7 | 7.2 | 9.8 | 16.0 | 27.4 | 45.2 | 68.2 |
| Tokens/Second per User | 93.0 | 81.1 | 68.7 | 56.5 | 43.0 | 29.5 | 24.1 | 20.5 | 17.0 | 12.8 |
| TTFT P99 (ms) | 12239.4 | 20479.5 | 20962.0 | 22399.9 | 23686.1 | 22979.2 | 29220.9 | 56258.1 | 85391.5 | 229120.7 |
| Utilization | 1.4% | 2.5% | 4.2% | 6.9% | 10.5% | 14.4% | 23.5% | 40.2% | 66.3% | 100.0% |
| Configuration Summary | TP=4, PP=1, batch=256, precision=FP8, kv_cache=FP16, cont_batch | TP=4, PP=1, batch=256, precision=FP8, kv_cache=FP16, cont_batch | TP=4, PP=1, batch=256, precision=FP8, kv_cache=FP16, cont_batch | TP=4, PP=1, batch=256, precision=FP8, kv_cache=FP16, cont_batch | TP=4, PP=1, batch=256, precision=FP8, kv_cache=FP16, cont_batch | TP=4, PP=1, batch=256, precision=FP8, kv_cache=FP16, cont_batch | TP=4, PP=1, batch=256, precision=FP8, kv_cache=FP16, cont_batch | TP=4, PP=1, batch=256, precision=FP8, kv_cache=FP16, cont_batch | TP=4, PP=1, batch=256, precision=FP8, kv_cache=FP16, cont_batch | TP=4, PP=1, batch=256, precision=FP8, kv_cache=FP16, cont_batch |
| QPS | 0.0002 | 0.0004 | 0.0006 | 0.0012 | 0.0018 | 0.0026 | 0.0043 | 0.0074 | 0.0119 | 0.0176 |
| Total Output Tokens | 168,384 | 315,398 | 539,431 | 867,074 | 1,378,214 | 1,986,475 | 3,400,893 | 6,215,384 | 10,495,599 | 16,968,991 |
| Run Duration (s) | 173,717.15 | 186,554.19 | 188,353.08 | 184,213.03 | 192,220.85 | 201,721.20 | 212,001.26 | 226,781.04 | 232,085.92 | 248,936.74 |
| Total Requests | 29 | 68 | 117 | 220 | 339 | 516 | 906 | 1,675 | 2,759 | 4,388 |
| Time To First Token (TTFT) (ms) | ||||||||||
| Minimum | 10084.6 | 11549.3 | 11442.7 | 11796.3 | 12352.2 | 12501.7 | 13088.8 | 13671.0 | 15126.0 | 18639.0 |
| Average | 10795.7 | 12495.6 | 12562.0 | 13372.0 | 13652.4 | 14241.0 | 18518.6 | 20908.9 | 27876.8 | 53407.6 |
| P50 | 10720.6 | 12255.5 | 12240.2 | 12618.5 | 12996.3 | 13499.9 | 14232.4 | 15370.9 | 17820.7 | 25893.4 |
| P90 | 11219.5 | 12931.7 | 12861.4 | 13184.8 | 13533.8 | 14807.6 | 20993.9 | 25002.6 | 79431.5 | 204936.4 |
| P95 | 11652.2 | 13152.7 | 13033.6 | 13414.9 | 23136.4 | 22800.7 | 28881.0 | 43784.2 | 80819.4 | 227214.5 |
| P99 | 12239.4 | 20479.5 | 20962.0 | 22399.9 | 23686.1 | 22979.2 | 29220.9 | 56258.1 | 85391.5 | 229120.7 |
| P999 | 12371.0 | 20609.1 | 20968.0 | 90508.4 | 23709.5 | 23765.6 | 691087.3 | 564477.9 | 550733.9 | 487221.1 |
| Maximum | 12385.7 | 20623.5 | 20968.3 | 109605.0 | 23711.0 | 23770.0 | 693520.9 | 569426.8 | 855644.6 | 651657.8 |
| Time Per Output Token (TPOT) (ms) | ||||||||||
| Minimum | 988.8 | 1091.9 | 1203.7 | 1379.8 | 1699.2 | 2315.8 | 2786.7 | 2794.6 | 3420.8 | 4031.7 |
| Average | 1010.0 | 1134.7 | 1330.7 | 1646.6 | 2111.2 | 2941.3 | 3494.0 | 3916.0 | 4663.5 | 5905.0 |
| P50 | 998.9 | 1140.4 | 1328.6 | 1648.7 | 2121.9 | 2952.3 | 3440.9 | 3835.9 | 4557.8 | 5834.7 |
| P90 | 1043.0 | 1163.1 | 1362.3 | 1690.1 | 2155.9 | 2970.7 | 3509.1 | 4044.5 | 4953.8 | 6293.7 |
| P95 | 1049.8 | 1170.4 | 1367.2 | 1694.9 | 2164.5 | 2976.6 | 3891.3 | 4445.9 | 5318.7 | 6566.8 |
| P99 | 1052.8 | 1175.6 | 1387.2 | 1707.0 | 2171.5 | 2993.2 | 4926.4 | 5977.3 | 6502.4 | 7241.6 |
| P999 | 1053.2 | 1179.0 | 1396.6 | 1709.6 | 2176.8 | 3003.7 | 6862.7 | 7639.9 | 7895.4 | 8314.0 |
| Maximum | 1053.3 | 1179.4 | 1397.6 | 1710.0 | 2178.8 | 3010.1 | 9007.5 | 9847.1 | 10867.7 | 8743.1 |
| Request Latency (ms) | ||||||||||
| Minimum | 927674.4 | 788299.1 | 837363.5 | 644657.7 | 699664.2 | 1135927.9 | 951145.9 | 997340.1 | 871014.8 | 1150544.0 |
| Average | 5990205.1 | 5315279.5 | 6154659.7 | 6499534.1 | 8548891.9 | 11276621.6 | 13056397.5 | 14415557.6 | 17680229.8 | 22647036.3 |
| P50 | 2422761.5 | 2545455.1 | 3101725.4 | 3244005.1 | 4457795.4 | 6154640.8 | 7076876.1 | 7683333.8 | 9498088.0 | 12149481.9 |
| P90 | 15335188.2 | 15088890.3 | 14485372.6 | 15789616.2 | 22644300.3 | 26279624.5 | 31251691.2 | 36287802.4 | 43714543.6 | 56870900.0 |
| P95 | 18599786.8 | 18380822.7 | 23198288.1 | 23184326.5 | 31616174.8 | 40386936.2 | 44238223.0 | 51368579.7 | 63169293.0 | 79066406.2 |
| P99 | 20704536.5 | 21001481.2 | 27152850.4 | 33429049.5 | 42396244.8 | 58646899.4 | 72150174.5 | 77121298.2 | 91616497.0 | 116331712.7 |
| P999 | 21039690.8 | 23019464.4 | 27241985.6 | 33726948.1 | 42875480.4 | 59112422.8 | 73155384.1 | 80648155.5 | 94797002.9 | 121436002.2 |
| Maximum | 21076930.2 | 23243684.7 | 27247533.4 | 33789303.9 | 42989680.9 | 59185143.4 | 73162326.4 | 80685205.1 | 96152892.2 | 121712265.4 |