GPTBot is terrible

A few weeks ago I got a message telling me that Leaf (My Halo Infinite stat site) was having some uptime issues. Each time I got the message the site loaded for me and none of my alerting showed any issues, so I ignored it. However, at some point the downtime must have been long enough for my alerting to catch it.

Now sadly for me I haven't really touched Halo Infinite since November of 2025, so I hadn't spent a lot of time working on Leaf. As far as I was concerned I was spending money for the greater Halo community and just making sure the site was still alive.
One of these alerts came in a day before I wrote this blog and I actually replicated the site not being responsive. I jumped onto the server and realized my php-fpm workers were exhausted and the site was just overwhelmed with traffic. Once I started following the access logs in real time I was amazed to see how many hits were coming in.
So I downloaded all of those access logs and asked goaccess to parse them.

I was kinda blown away seeing nearly 2 million hits a day with some days averaging half a million unique visits. I have no idea what occurred on May 27, 2026 when I almost hit a unique total of a 1 million visitors. This is a site running a database, cache (Redis), PHP and NGINX all on one server for $40/month. It seemed pretty cool to be holding that level of traffic on a PHP powered Laravel application.
Now I wanted to dig in and see what was going on, because surely this was not all humans. A quick check with AI to help me group these requests confirmed my suspicion.
| Bot/Type | Requests |
|---|---|
| Human | 14,047,995 |
| GPTBot | 4,605,881 |
| Applebot | 3,082,590 |
| MetaBot | 2,245,343 |
| Amazonbot | 2,240,132 |
| Bytespider/TikTok | 1,139,015 |
| ClaudeBot | 737,153 |
| Bingbot | 499,710 |
| Googlebot | 270,827 |
| DataForSeoBot | 260,621 |
| Baiduspider | 74,465 |
| Other Bot | 49,343 |
| PetalBot | 44,870 |
| AliyunSecBot | 25,206 |
| SleepBot | 25,051 |
For only requests from May 17, 2026 to May 31, 2026 I tracked 29,327,202 total requests with 52% of those (15,279,207) being classified as non-human. I was mad I didn't have a bigger history of analytics, but my log rotation was purging out older stats as new data came in. I remembered a post I wrote 4 years ago about this same problem on this same site, but I was complaining about ~50,000 requests a day in that blog. Now I was dealing with ~2,000,000 hits a day stretching the limits of my equipment even more.
I was mad at GPTBot (OpenAI) hitting my little tiny Halo Infinite site nearly 5 million times in 14 days. It is absolutely ridiculous of that scale of requests (357k/day) which surmounts to just analytics on Halo Infinite matches. It seems because they rotate around 400 different IPs that none of my "burst" rate limit detection works. I may need to research a smarter technique to target this form of AI bots harvesting content at an insane rate.
So I wondered why GPTBot was obsessed with this, so I tried out a simple query myself on my own gamertag.

Sure enough this AI reached out to my server (again) and queried it to return the results. Perhaps this service is not caching any results, because I re-ran other tests and could watch the OpenAI hits come into my web server in real time. In the era prior to AI this search would have landed on my site leaving the visitor in my website ecosystem. Now people can harvest information from my site without ever visiting it.
I'll head back to the drawing board, because bots consuming 52% of my daily traffic resulting in 100's of gigabytes of traffic is no longer okay for me.
