When dealing with third party web apis one often times encounter that the api you want to use is rate limited, which is a reasonable measure by the service provider to prevent users from overloading the system. But nevertheless it’s often times a big inconvenience when you have a lot of requests to make and you will exceed the rate limit. Most systems work by limiting the amount of requests in a given timespan.
I am currently developing a ruby-toolbox clone for the swift programming language. In order to get the relevant library information I have to work a lot with the Github API First of all: the Github API is amazing you can get most information with just one well crafted graphql query. The github api is rate-limited so that you are only allowed to make a 1000 requests per hour, which to be honest is enough for most use cases. Having more than 40k projects I repeatedly run into exceeding that limit.
So how to deal with that, you could just let the request fail when receiving the rate limit exceeded status code from Github and let a job system like Sidekiq let it retry. But sidekiq in it’s default configuration retries the job repeatedly and isn’t aware of the next Github request limit reset, so the system would basically just retry a couple of times and give up, and even worse Github would probably call after a while, asking whether I am aware of those unnecessary requests. Simply put that’s not a solution.
The solution I came up with instead uses a redis backed system that schedules the jobs and respects the next api rate limiti reset.
Every GrahpQL query I submit requests the current rate limiting state. The rate limiting state includes how many requests I have left and when the next rate limit reset will occure. That’s everything I need to know. This information is stored after each request in redis, I wrapped this rate limiting state in a handy ruby class that deals with writing and reading this state from/to redis so that the code isn’t littered with redis connections. Now that I have the current rate limit state I need to handle the failure case gracefully. To do this I wrote a small class that wraps the state and checks whether it is possible to perform the request. If possible I perform the given block, otherwise I take the class and the arguments of the worker and perform it the moment the next reset occurs.
class GithubRateLimitedApiState def perform_if_possible(worker_class:, args:) if state.remaining_requests > 0 yield state else worker_class.perform_at(state.next_reset + 1.second, *args) end end
Putting it all together I rewrote the
FetchRepoInformationWorker to use this
small method. In the passed block I perform the request and update the
rate limiting state so that the system knows the new state
class FetchRepoInformationWorker include Sidekiq::Worker def perform(project_url) GithubRateLimitedApiState.perform_or_delay(worker_class: self.class, args: [project_url]) do |state| repo_info = GithubGraphqlClient.repository_information(project_url) # store information # ... # update the new state state.next_reset = repo_info.rate_limiting_state.next_reset state.remaining_requests = repo_info.rate_limiting_state.remaining_requests end end end
And it works quite well, I put it in production and as of know Sentry stayed quiet and the system updates the repositories on a regular basis. It needs to be seen whether there are any error cases like a race condition between the jobs, but even then the retry system should handle that.
I hope that this blog post gave you some ideas how to handle rate limited apis with sidekiq and redis.