Workflow Load Testing
Overview
The Workflow Load Testing feature allows you to stress test your MCP workflows with configurable parameters, measuring performance under concurrent execution. Get detailed metrics, visual timelines, and automated insights to validate your workflows can handle real-world load.
Perfect for validating workflow reliability, identifying performance bottlenecks, and ensuring your MCP server implementations scale appropriately.
Key Capabilities
📊 Comprehensive Load Testing
Execute workflows repeatedly with configurable duration and parallelism
⏱️ Smart Ramp-Up/Down
Gradually increases parallelism in first 60 seconds, decreases in last 60 seconds to avoid overwhelming the system
📈 Real-Time Progress Tracking
Live metrics during test execution showing success, failure, and active execution counts
📉 Detailed Performance Metrics
Duration statistics including Average, Min, Max, Median, P95, P99, plus throughput and peak concurrency
📊 Timeline Visualization
SVG chart showing cumulative successes, failures, and active executions over time
🤖 Automated Observations
Rule-based insights about reliability, performance variance, throughput patterns, and error analysis
💾 Results Management
Persistent storage with history panel, export to JSON, and delete capabilities
🔒 Execution Isolation
Load test runs do NOT appear in Recent Executions list—keeping them separate from normal workflow runs
Getting Started
Step 1: Select a Workflow
Before running a load test, you need an existing workflow.
- Navigate to 🔗 Workflows tab
- Select a workflow from the list
- Verify the workflow has been tested manually at least once
- Locate the 📊 Load Test button next to “Run Workflow”
📸 Screenshot placeholder:
Description: Show the workflow viewer with the Load Test button (📊) positioned next to the Run Workflow button
Step 2: Configure Load Test
Click the 📊 Load Test button to open the configuration dialog.
Configuration options:
| Field | Description | Range |
|---|---|---|
| Connection | MCP server connection to use | Required |
| Duration | Total test duration in seconds | 120-3600 seconds |
| Max Parallel Executions | Maximum concurrent workflow runs | 1-100 |
| Runtime Parameters | Values for prompt-at-runtime parameters | If required |
📸 Screenshot placeholder:
Description: Show the load test configuration dialog with connection dropdown, duration field showing “300”, max parallel field showing “5”, and runtime parameters section
info: Ramp Up/Down: The test automatically ramps up parallelism during the first 60 seconds and ramps down during the last 60 seconds. This prevents sudden load spikes that could overwhelm your server.
Step 3: Run the Load Test
- Click 📊 Start Load Test button
- Watch the real-time progress indicator
- Monitor live metrics as executions complete
- Wait for test completion (or click Cancel to stop early)
During execution, you’ll see:
- Execution count: Total completed executions
- Success badge (✓): Number of successful runs
- Failure badge (✗): Number of failed runs
- Average duration: Rolling average execution time
📸 Screenshot placeholder:
Description: Show the progress indicator with “42 executions completed”, success/failure badges showing “✓ 40 ✗ 2”, and average duration “⏱ 1250ms avg”
Step 4: Review Results
After the test completes, the Load Test Results Viewer opens automatically.
Results are organized into sections:
- Test Summary - Configuration details and final status
- Performance Metrics - Execution counts and duration statistics
- Execution Timeline - Visual chart of test progression
- Step Duration Breakdown - Per-step timing analysis
- Observations - Automated insights and recommendations
- Error Summary - Grouped error counts
📸 Screenshot placeholder:
Description: Show the full results viewer with all sections visible: summary, metrics cards, timeline chart, step durations, observations, and error summary
Understanding Results
Performance Metrics
The results viewer displays key metrics in easy-to-read cards:
Execution Counts:
- Total: All execution attempts
- Success: Completed successfully
- Failed: Errored or timed out
- Partial: Some steps succeeded, some failed
Duration Statistics:
- Average: Mean execution time
- Median: Middle value (50th percentile)
- Min/Max: Fastest and slowest executions
- P95: 95th percentile (95% of executions were faster)
- P99: 99th percentile (99% of executions were faster)
Throughput Metrics:
- Throughput: Executions per second
- Peak Concurrent: Maximum simultaneous executions reached
📸 Screenshot placeholder:
Description: Show the metrics cards grid with Total (147), Success (142, 96.6%), Failed (5), Partial (0), Average (1250ms), Median (1180ms), Min (850ms), Max (2340ms), P95 (1890ms), P99 (2210ms), Throughput (2.89/sec), Peak Concurrent (5)
Timeline Visualization
The timeline chart shows how the test progressed over time:
Chart elements:
- Green line: Cumulative successful executions
- Red line: Cumulative failed executions
- Blue dashed line: Active (in-progress) executions at each moment
What to look for:
- Steady green line slope = consistent throughput
- Red line spikes = failure clusters (investigate server issues)
- Blue line shape = parallelism ramp-up and ramp-down pattern
📸 Screenshot placeholder:
Description: Show the SVG timeline chart with green success line rising steadily, minimal red failure line, and blue dashed line showing ramp-up pattern in first 60s and ramp-down in last 60s
Step Duration Breakdown
See which workflow steps take the most time:
Bar chart shows:
- Each step’s average duration
- Percentage of total workflow time
- Visual comparison between steps
Use this to:
- Identify slow steps that need optimization
- Understand where time is spent
- Prioritize performance improvements
📸 Screenshot placeholder:
Description: Show horizontal bar chart with 3 steps: tool_1 (650ms, 52%), tool_2 (400ms, 32%), tool_3 (200ms, 16%)
Automated Observations
The system analyzes results and generates insights:
Observation types:
| Icon | Type | Example |
|---|---|---|
| ✓ | Excellent | “Success rate is 99% or higher” |
| ✓ | Good | “Success rate is above 95%” |
| ⚠ | Warning | “P95 latency is 45% higher than average” |
| ℹ | Info | “Average throughput: 2.34 executions per second” |
Common observations:
- Reliability assessment: Based on success rate
- Performance variance: Comparing P95 to average
- Throughput classification: Low/Average/High
- Error pattern detection: Systematic vs random failures
📸 Screenshot placeholder:
Description: Show the observations section with 4 observation badges: “✓ Good reliability: Success rate is above 95%”, “⚠ Moderate performance variance: P95 latency is 51% higher than average”, “ℹ Average throughput: 2.89 executions per second”, “ℹ Failures distributed across 3 different error types”
Error Summary
When failures occur, errors are grouped by message:
Display format:
[count] Error message
Example:
[3] Connection timeout after 30 seconds
[1] Invalid parameter: 'value' is required
[1] Tool execution failed: unexpected error
Use this to:
- Identify systematic issues (same error many times)
- Diagnose server problems
- Prioritize error fixes
Managing Load Tests
Load Test History
Recent load tests appear in the Load Test History section below the workflow viewer.
Each history entry shows:
- Status badge (✓ Completed, ⚠ Cancelled)
- Timestamp
- Execution counts (total, succeeded, failed)
- Delete button
📸 Screenshot placeholder:
Description: Show the Load Test History section with 3 entries: two completed tests and one cancelled test, with timestamps and execution counts
Viewing Past Results
- Click any entry in Load Test History
- Results viewer opens with full metrics
- Review timeline, observations, and errors
- Compare with other test runs
Exporting Results
Export load test results as JSON for further analysis:
- Open a load test result (current or from history)
- Click 📥 Export button
- JSON file downloads with descriptive filename
Filename format:
loadtest_{WorkflowName}_{YYYYMMDD_HHmmss}_{TestId}.json
Export includes:
- Complete configuration
- All execution records
- Calculated metrics
- Timeline data
- Observations
Deleting Load Tests
Remove load tests you no longer need:
- Find the test in Load Test History
- Click the 🗑 delete button
- Confirm deletion
warning: Permanent deletion: Load test results cannot be recovered after deletion.
Best Practices
🎯 Start Small
Begin with short duration (120s) and low parallelism (2-3) to establish baseline metrics.
📊 Test Incrementally
Gradually increase parallelism to find the breaking point of your server.
🔄 Run Multiple Tests
Execute several tests to account for variance and validate consistency.
📝 Document Results
Export and save results for comparison over time as you optimize.
⚠️ Monitor Server Resources
Watch server CPU, memory, and network during load tests to identify bottlenecks.
🧪 Test Different Scenarios
Create separate workflows for different use cases and test each independently.
📈 Compare P95 to Average
Large differences indicate inconsistent performance that may affect users.
🔍 Investigate Failures
Don’t ignore even small failure rates—they may indicate systemic issues.
Troubleshooting
Load Test Won’t Start
Problem: Start button disabled or test fails immediately
Solutions:
- Verify a connection is selected
- Ensure connection is valid and server is running
- Check all runtime parameters are filled
- Verify workflow has at least 2 steps
- Test workflow manually first
All Executions Failing
Problem: 100% failure rate during load test
Solutions:
- Test workflow manually to verify it works
- Check server logs for errors
- Verify connection credentials are valid
- Ensure server can handle concurrent requests
- Check for rate limiting on the server
High P95/P99 Latency
Problem: Tail latencies much higher than average
Causes:
- Server resource contention under load
- Garbage collection pauses
- Network latency variance
- Database connection pooling
Solutions:
- Monitor server resources during test
- Check for memory pressure
- Review database query performance
- Consider connection pooling configuration
Results Not Persisting
Problem: Load test history is empty after test completes
Solutions:
- Check application has write permissions
- Verify storage location exists:
- Windows:
%APPDATA%\McpExplorer\load_tests\ - macOS/Linux:
~/.local/share/McpExplorer/load_tests/
- Windows:
- Review console for storage errors
- Ensure adequate disk space
Timeline Chart Empty
Problem: No data appears in timeline visualization
Solutions:
- Ensure test ran for more than a few seconds
- Check that executions actually completed
- Verify timeline data was captured (check export JSON)
- Refresh the results viewer
Technical Details
Storage Location
Load test results are saved as JSON files:
- Windows:
%APPDATA%\McpExplorer\load_tests\ - macOS/Linux:
~/.local/share/McpExplorer/load_tests\
Ramp-Up/Down Algorithm
First 60 seconds: parallelism = max(1, MaxParallel × (elapsed / 60))
Middle period: parallelism = MaxParallel
Last 60 seconds: parallelism = max(1, MaxParallel × (remaining / 60))
Metrics Calculation
- Median: Middle value when durations are sorted
- P95: Value at 95th percentile position
- P99: Value at 99th percentile position
- Throughput: Total executions ÷ Total duration
Related Pages
Create Workflows First
Load testing requires an existing workflow:
Configure Connections
Tests use MCP connections:
Understand Tool Execution
Learn about individual tool testing:
Next Steps
Now that you understand load testing:
- Create a test workflow - Build a simple 2-3 step workflow
- Run baseline test - Start with 120s duration, 2 parallel
- Analyze results - Review metrics and observations
- Scale up testing - Gradually increase parallelism
- Export and compare - Track performance over time
Load testing gives you confidence your workflows will perform under real-world conditions! 📊