Menu
2 Nodes

System 4

Submitted by Submitter 4on 2026-02-27. Published on 2026-03-16

SUT Summary
System Name System 4
System Availability Available
System Category Datacenter · Cloud
System Size 8x Accelerator 4
Model
Model Name QWEN3 CODER 480B
Division Open
Model Precision FP8
Model Link
Transformation Link
Model Notes
Datasets
Dataset Name OpenOrca
Dataset Type Performance
Average Input Tokens 190.362
Average Output Tokens 292.545
Dataset Link
Measured Accuracy Score

Throughput vs Interactivity

050100150200System TPS5101520Interactivity (tok/s/user)

Throughput vs Concurrency

050100150200System TPS0204060Concurrency

Time to First Token vs Concurrency

0100200300400TTFT P99 (s)0204060Concurrency

Interactivity vs Concurrency

5101520Interactivity (tok/s/user)0204060Concurrency
Hardware

Processor

Processor Model Name Processor 4
Processors per Node 2
Cores Per Processor 32
VCPUs Per Processor

Accelerator

Accelerator Model Name Accelerator 4
Accelerators per Node 8
Memory Type
Memory Capacity 256 GB
Accelerator Interconnect
Host-Accelerator Interconnect

Host / Storage

Host Memory Capacity 1TB
Memory Configuration
Storage Capacity
Storage Type
Cooling Liquid-cooled
Hardware Notes
Software
Framework vLLM
Operating System Linux
Other Software Inference Backend v1.0
Software Notes
Run Data
Field NameRun 1Run 2Run 3Run 4Run 5Run 6Run 7Run 8
Run Date02/25/202602/25/202602/25/202602/25/202602/25/202602/24/202602/24/202602/28/2026
Concurrency0.541.092.184.368.7117.4234.8569.70
System Tokens/Second12.022.938.864.498.5131.4171.6199.8
Tokens/Second per User22.021.017.814.811.37.54.92.9
TTFT P99 (ms)2736.73136.53651.64022.04986.412923.1156250.0408064.0
Utilization6.0%11.5%19.4%32.2%49.3%65.8%85.9%100.0%
Configuration SummaryTP=4, PP=1, batch=256, precision=FP8, kv_cache=FP16TP=4, PP=1, batch=256, precision=FP8, kv_cache=FP16TP=4, PP=1, batch=256, precision=FP8, kv_cache=FP16TP=4, PP=1, batch=256, precision=FP8, kv_cache=FP16TP=4, PP=1, batch=256, precision=FP8, kv_cache=FP16TP=4, PP=1, batch=256, precision=FP8, kv_cache=FP16TP=4, PP=1, batch=256, precision=FP8, kv_cache=FP16TP=4, PP=1, batch=256, precision=FP8, kv_cache=FP16
QPS0.04110.07800.13260.21990.33580.44780.58630.6833
Total Output Tokens7,176,5387,214,1147,189,6787,198,8287,208,9517,211,6497,193,0977,186,180
Run Duration (s)597,954.52315,267.38185,363.60111,745.4073,190.4454,887.2341,915.7235,969.14
Total Requests24,57624,57624,57624,57624,57624,57624,57624,576
Time To First Token (TTFT) (ms)
Minimum1163.21297.81511.01747.71953.13331.74600.86622.7
Average1940.92015.42257.42587.53256.14728.49048.323312.8
P501802.21834.42003.02518.93109.14298.86331.711657.2
P902598.02633.72796.12994.63913.85061.37152.114908.6
P952627.72663.12848.83367.64035.85442.39498.415134.0
P992736.73136.53651.64022.04986.412923.1156250.0408064.0
P9993543.23612.911091.122970.740832.681758.8222715.0483273.0
Maximum3966.816722.511100.922986.043035.184065.5234174.1493677.7
Time Per Output Token (TPOT) (ms)
Minimum635.8660.5696.5727.7772.9881.41054.21450.0
Average663.0695.3819.6985.21285.41921.72920.84955.7
P50661.9694.2818.4983.91284.71922.42925.75002.4
P90671.3706.2833.41004.51311.01952.72971.55117.6
P95675.1710.7839.21012.11322.51966.42989.45153.0
P99688.3727.3861.91040.11365.62023.13105.75294.5
P999795.2863.5992.31172.91506.02306.13630.06425.4
Maximum1423.21319.21562.21900.52315.84659.95767.210909.1
Request Latency (ms)
Minimum2466.52518.63101.73327.13771.55382.78257.912619.4
Average194604.9205125.7240867.9289764.2378403.3565472.2858459.61460604.1
P50183004.6192299.8225788.1271819.4354544.8528939.8805844.71373089.6
P90292188.1308934.3361530.2435060.4569214.8848320.21288795.12196319.0
P95351626.5371746.5432422.3524562.4684345.11030600.41539162.72604095.7
P99555123.0604472.1719448.1858295.71112878.51643599.72476371.94150556.0
P999681548.8716491.2842323.41011852.91321871.61972664.03007519.35166714.8
Maximum705785.9742232.3872021.21026426.21342664.01995493.63158377.35468721.5