The default prompt is insufficiently long for effective benchmarking. To ensure more stable and meaningful metrics, please use a longer prompt (e.g., 500 or 1000 tokens from the samples) or bing your own prompt.
Response:
Press 'Generate Response' or 'Benchmark' to start.