Menu
1 Node

System 3

Submitted by Submitter 3on 2026-02-27. Published on 2026-03-16

SUT Summary
System Name System 3
System Availability Available
System Category Datacenter · Cloud
System Size 8x Accelerator 3
Model
Model Name QWEN3 CODER 480B
Division Open
Model Precision FP8
Model Link
Transformation Link
Model Notes
Datasets
Dataset Name OpenOrca
Dataset Type Performance
Average Input Tokens 190.362
Average Output Tokens 292.545
Dataset Link
Measured Accuracy Score

Throughput vs Interactivity

50100150200System TPS1020304050Interactivity (tok/s/user)

Throughput vs Concurrency

50100150200System TPS0102030Concurrency

Time to First Token vs Concurrency

050100150TTFT P99 (s)0102030Concurrency

Interactivity vs Concurrency

1020304050Interactivity (tok/s/user)0102030Concurrency
Hardware

Processor

Processor Model Name Processor 3
Processors per Node 2
Cores Per Processor 32
VCPUs Per Processor

Accelerator

Accelerator Model Name Accelerator 3
Accelerators per Node 8
Memory Type
Memory Capacity 256 GB
Accelerator Interconnect
Host-Accelerator Interconnect

Host / Storage

Host Memory Capacity 1TB
Memory Configuration
Storage Capacity
Storage Type
Cooling Liquid-cooled
Hardware Notes
Software
Framework vLLM
Operating System Linux
Other Software Inference Backend v1.0
Software Notes
Run Data
Field NameRun 1Run 2Run 3Run 4Run 5Run 6Run 7
Run Date02/21/202602/21/202602/21/202602/21/202602/22/202602/22/202602/20/2026
Concurrency0.541.092.184.368.7117.4234.85
System Tokens/Second28.650.667.999.0132.1159.9192.4
Tokens/Second per User52.546.431.222.715.29.25.5
TTFT P99 (ms)2865.52946.43612.24613.75748.237859.1139725.9
Utilization14.9%26.3%35.3%51.5%68.7%83.1%100.0%
Configuration SummaryTP=4, PP=1, batch=256, precision=FP8, kv_cache=FP16TP=4, PP=1, batch=256, precision=FP8, kv_cache=FP16TP=4, PP=1, batch=256, precision=FP8, kv_cache=FP16TP=4, PP=1, batch=256, precision=FP8, kv_cache=FP16TP=4, PP=1, batch=256, precision=FP8, kv_cache=FP16TP=4, PP=1, batch=256, precision=FP8, kv_cache=FP16TP=4, PP=1, batch=256, precision=FP8, kv_cache=FP16
QPS0.09770.17280.23180.33810.45160.54750.6559
Total Output Tokens7,185,8247,193,5777,193,5517,194,2167,189,5767,178,9967,208,408
Run Duration (s)251,485.39142,242.92106,012.5272,682.0154,422.8244,890.0637,471.23
Total Requests24,57624,57624,57624,57624,57624,57624,576
Time To First Token (TTFT) (ms)
Minimum1001.81026.31306.21568.72240.43738.15850.5
Average1555.81691.92109.72642.63701.25922.611041.3
P501360.71413.72126.72382.93392.35250.58178.2
P902289.02346.52651.83296.14489.86254.19907.1
P952313.42418.03111.03748.24854.56723.712367.7
P992865.52946.43612.24613.75748.237859.1139725.9
P9993820.83893.430632.341814.264078.2107027.8218317.6
Maximum46460.162353.930642.741832.464094.9109879.6221595.2
Time Per Output Token (TPOT) (ms)
Minimum251.0268.2380.4404.1443.0557.3740.9
Average275.9311.9465.9638.8956.21577.92608.6
P50275.4311.4465.3637.9956.51579.52616.0
P90284.0323.4480.6658.1981.91611.52669.7
P95287.0327.7486.4666.3991.31624.02687.5
P99296.1339.6503.2692.21020.21665.32763.8
P999346.1397.8578.8796.61183.71927.43158.6
Maximum908.41078.4981.91407.32162.12862.16114.9
Request Latency (ms)
Minimum1602.32122.72075.12833.53429.65403.89214.5
Average81843.392557.4137833.4188675.6281970.4464204.9770972.4
P5077034.786764.7129219.5176393.1264120.3435826.5723171.0
P90122711.6139499.9206452.1283520.8422165.1696626.91157073.1
P95146902.1167684.8247319.9340069.0508749.4834686.41397625.8
P99236594.8266247.5409381.0561709.5820841.11338484.22235110.6
P999286668.0324777.3480694.4659355.3987455.51625664.12695036.9
Maximum293986.8373253.6488764.1668775.91000196.81672530.62801331.5