Lessons learned

  • Track and document your progress and regressions with dates (and commits)
  • When simulating traffic - simulate a lot (in our case we are simulating 500M requests - the full simulation lasts over 1h!)
    • (a queuing system can be very helpful)
  • Do not log everything - every IO operation has a cost!
  • Do not put telemetry on everything (measure cost)!
  • The server generating traffic must be much faster than the expected speed of the server consuming traffic. (in our case 250_000 req/s vs 40_000 req/s)