How to use toxiproxy to verify your code can handle timeouts and unavailable endpoints

When you have code that calls an endpoint, you need to make sure it’s resilient and can handle error scenarios, such as timeouts.

One way to prove your code is resilient is by using toxiproxy to simulate bad behavior. Toxiproxy sits between your client code and the endpoint. It receives requests from your client, applies toxic behavior to simulate error scenarios, and then forwards the request to the real endpoint.

Toxiproxy receives requests from your client, applies toxic behavior to simulate error scenarios, and then forwards the request to the real endpoint

In this article I’ll explain how to install and use toxiproxy to simulate two error scenarios:

  1. The request taking too long and causing a client-side timeout.
  2. The request failing due to the endpoint being unavailable.

I’ll start with client code that has no error handling and show how it fails in the error scenarios, and then show how to handle the errors.

Note: In this article I’ll be referring “C:/toxiproxy” as the install location, but you can put toxiproxy anywhere you want.

1 – Download toxiproxy client and server

  1. Go here: https://github.com/Shopify/toxiproxy/releases.
  2. Download the appropriate client and server for whatever OS you’re using.
  3. Put them in C:/toxiproxy
  4. Rename them to server.exe and client.exe.

In my case I’m using Windows 64-bit, and at the time of writing this, the latest version of toxiproxy was 2.1.4. So I grabbed the following two executables:

  • toxiproxy-cli-windows-amd64.exe
  • toxiproxy-server-windows-amd64.exe

2 – Configure toxiproxy to proxy requests to the real endpoint

  • Create C:\toxiproxy\config.json
  • Configure toxiproxy to work with your upstream endpoint. Let’s say you’re calling GET on a weather API running on 127.0.0.1:12345. In config.json, you would add the following:
[ { "name":"weather", "listen":"127.0.0.1:12001", "upstream":"127.0.0.1:12345" } ]

Explanation of these settings:

SettingValueExplanation
nameweatherHow you’ll refer to this endpoint from the toxiproxy client. Use a short and simple name.
listen127.0.0.1:12001This is the endpoint that toxiproxy listens for requests on.

Note: Make sure the port is not blocked by the firewall.
upstream127.0.0.1:12345This is the real endpoint.

When toxiproxy receives requests on its listening endpoint, it forwards the requests to this upstream endpoint.

3 – Run the toxiproxy server

From the command line, run server.exe and specify config.json.

./server -config config.json

Note: I’m using a bash terminal.

You should see the following output:

msg="Started proxy" name="weather" proxy="127.0.0.1:12001" upstream="127.0.0.1:12345" msg="Populated proxies from file" config="config.json" proxies=1 msg="API HTTP server starting" host="localhost" port="8474" version="2.1.4"

Troubleshooting common toxiproxy server errors

ErrorSolution
An attempt was made to access a socket in a way forbidden by its access permissions.Something else is already using the listener port specified in config.json.

Find an available port and update the listener port in config.json, then restart server.exe.
listen tcp 127.0.0.1:8474
Only one usage of each socket address is normally permitted
Toxiproxy has a listener on port 8474 (to receive commands from the toxiproxy client).

This means another instance of the toxiproxy server is already running and using port 8474. Just shut down the other instance.

Note: It’s also possible that another program is using 8474. This would be bad, because it looks like toxiproxy has this port hardcoded. You’d have to take the toxiproxy source and recompile it with a different port in this case.

If you’re seeing other strange behavior, such as traffic not getting through, make sure the firewall is not blocking you.

4 – Update the weather client to use the toxiproxy listener endpoint, then start the weather client

I have this very simple client code that polls the weather API every 5 seconds. I’ll be referring to this as the weather client (to distinguish it from the toxiproxy client). It has no error handling. Currently, it’s pointing to the real upstream endpoint at 127.0.0.1:12345.

I changed it to point to the toxiproxy listener endpoint at 127.0.0.1:12001.

HttpClient httpClient = new HttpClient() { Timeout = TimeSpan.FromSeconds(5) }; while (true) { Log("Getting weather"); /* * Pointing to the real upstream endpoint var response = await httpClient.GetAsync("http://127.0.0.1:12345/weather"); */ //Pointing to toxiproxy listener endpoint var response = await httpClient.GetAsync("http://127.0.0.1:12001/weather"); var content = await response.Content.ReadAsStringAsync(); Log($"StatusCode={response.StatusCode} Weather={content}"); await Task.Delay(TimeSpan.FromSeconds(5)); }

After changing the weather client to point to toxiproxy’s listener endpoint, start running the weather client.

At this point the weather client is going through toxiproxy and behaving normally. It’s polling the weather API every 5 seconds and showing this output:

08:10:24.435 Getting weather 08.10:24.438 StatusCode=OK Weather={"temperatureF":58,"description":"Sunny"} 08:10:29.446 Getting weather 08.10:29.450 StatusCode=OK Weather={"temperatureF":57,"description":"Sunny"}

5 – Use the toxiproxy client to simulate the endpoint being unavailable

The following command turns off the toxiproxy weather listening endpoint:

./client toggle weather
Proxy weather is now disabled

When the weather client tries to connect, it gets the following exception:

System.Net.Http.HttpRequestException: ‘No connection could be made because the target machine actively refused it.’

Inner Exception

SocketException: No connection could be made because the target machine actively refused it.

This crashes the weather client, because it has no error handling at all. Let’s fix that in the next step.

6 – Update the weather client to handle the unavailable endpoint scenario

To handle the unavailable endpoint error, we need to catch HttpRequestException and check its inner exception. It should be a SocketException with the ErrorCode = SocketError.ConnectionRefused (10061).

Next, we need to think of an error handling strategy. I’m going to use a simple retry strategy:

  1. When the endpoint is unavailable, try the failover URL.
  2. When the failover URL is unavailable, shut down the weather client.

Make sure to use whatever error handling strategy makes sense in your situation.

HttpClient httpClient = new HttpClient() { Timeout = TimeSpan.FromSeconds(5) }; bool failedOver = false; //this is the toxiproxy url string url = "http://127.0.0.1:12001/weather"; string failOverUrl = "http://127.0.0.1:12345/weather"; while (true) { try { Log("Getting weather"); var response = await httpClient.GetAsync(url); var content = await response.Content.ReadAsStringAsync(); Log($"StatusCode={response.StatusCode} Weather={content}"); } catch(HttpRequestException ex) when (ex?.InnerException is SocketException se && se.ErrorCode == (int)SocketError.ConnectionRefused) { if (!failedOver) { Log("Endpoint is unavailable. Switching to failover url"); url = failOverUrl; failedOver = true; } else { Log("Failover Url is unavailable. Shutting down!"); return; } } await Task.Delay(TimeSpan.FromSeconds(5)); }

Now run the weather client again and look at the output:

09:10:00.726 Getting weather 09:10:02.816 Endpoint is unavailable. Switching to failover url 09:10:07.816 Getting weather 09:10:07.842 StatusCode=OK Weather={"temperatureF":50,"description":"Sunny"}

It’s detecting the service unavailable scenario and using the failover URL to successfully get the weather.

This shows how convenient it is to use toxiproxy to simulate an endpoint unavailable scenario.

Note: This only shows one possible Error Code (10061 – Connection Refused). Make sure to think about other error codes that could happen and handle whatever ones make sense in your situation. Here is a reference to the different socket error codes you can run into: SocketError Enum.

7 – Re-enable the toxiproxy endpoint and restart the client

Before going to the next error scenarios, re-enable the weather endpoint by executing the following command:

./client toggle weather

You should see the following output:

Proxy weather is now enabled

Now restart the weather client. It should be working normally again.

8 – Use the toxiproxy client to cause timeouts

In the weather client I have specified a 5 second timeout in the HttpClient constructor:

HttpClient httpClient = new HttpClient() { Timeout = TimeSpan.FromSeconds(5) };

This means the weather client will timeout if the request takes longer than 5 seconds.

To simulate a request taking a long time, we can use the toxiproxy client to add latency with the following command:

./client toxic add weather -t latency -a latency=6000

This will output:

Added downstream latency toxic 'latency_downstream' on proxy 'weather'

Now make sure the weather client is running. When it makes a request, toxiproxy will make the request take 6 seconds, so it’ll timeout on the client-side and get the following exception:

System.Threading.Tasks.TaskCanceledException: ‘The operation was canceled.’

Let’s update the weather client to handle this exception and deal with the timeout scenario.

9 – Update the weather client to handle the timeout scenario

To handle timeouts coming from HttpClient, we need to catch TaskCanceledException and handle it appropriately. One common approach is to retry the request with a longer timeout. Of course, you’ll need to use the error handling strategy that makes sense for your situation.

I’m going to do a simple retry strategy:

  1. Start with a 5 second timeout.
  2. If a timeout happens, increase the timeout to 10 second for future requests.

To change the timeout, you can’t just change the HttpClient.Timeout property. That results in the following exception:

InvalidOperationException: This instance has already started one or more requests and can only be modified before sending the first request

And because we should always reuse HttpClient objects (instead of creating new ones for each request), this means we’ll need to use a CancellationTokenSource with a specified timeout, then pass it in as a CancellationToken.

int timeout = 5000; int extraTimeout = 10_000; HttpClient httpClient = new HttpClient(); bool failedOver = false; //this is the toxiproxy url string url = "http://127.0.0.1:12001/weather"; string failOverUrl = "http://127.0.0.1:12345/weather"; while (true) { try { Log("Getting weather"); var cancelToken = new CancellationTokenSource(timeout); var response = await httpClient.GetAsync(url, cancelToken.Token); var content = await response.Content.ReadAsStringAsync(); Log($"StatusCode={response.StatusCode} Weather={content}"); } catch(HttpRequestException ex) when (ex?.InnerException is SocketException se && se.ErrorCode == (int)SocketError.ConnectionRefused) { if (!failedOver) { Log("Endpoint is unavailable. Switching to failover url"); url = failOverUrl; failedOver = true; } else { Log("Failover Url is unavailable. Shutting down!"); return; } } catch(TaskCanceledException) { Log($"Timed out. Will try again with a {extraTimeout} millisecond timeout"); timeout = extraTimeout; } await Task.Delay(TimeSpan.FromSeconds(5)); }

Now run the weather client.

10:10:36.710 Getting weather 10:10:41.749 Timed out. Will try again with a 10000 millisecond timeout 10:10:46.750 Getting weather 10:10:52.765 StatusCode=OK Weather={"temperatureF":59,"description":"Sunny"}

As you can see, it got a timeout as expected. Then it increased the timeout to 10 seconds and the second request was successful. If you look at the timestamps, you’ll notice it took ~6 seconds to get the response.

10 – Use the toxiproxy client to remove the timeout behavior

First, inspect the weather proxy to see what the toxic is called.

./client inspect weather

This gives the following output:

latency_downstream type=latency stream=downstream toxicity=1.00 attributes=[ jitter=0 latency=6000 ]

This shows the toxic is referred to as “latency_downstream,” so to remove it, execute the following command:

./client toxic remove weather -n latency_downstream

You’ll see the following response:

Removed toxic 'latency_downstream' on proxy 'weather'

After removing this, you’ll notice that the weather client is back to normal and getting responses very quickly (a few milliseconds).

Leave a Comment