Lessons learned
- Track and document your progress and regressions with dates (and commits)
- When simulating traffic - simulate a lot (in our case we are simulating 500M requests - the full simulation lasts over 1h!)
- (a queuing system can be very helpful)
- Do not log everything - every IO operation has a cost!
- Do not put telemetry on everything (measure cost)!
- The server generating traffic must be much faster than the expected speed of the server consuming traffic. (in our case
250_000 req/svs40_000 req/s)