Pi-hole: 3 years later
In 2017 I installed a Pi-hole into my network on a Raspberry Pi and routed all my internet traffic through it. Today is now April 16, 2021 and I've been running it the entire time.
I've now moved and now have gigabit Internet as well as learned that an older Raspberry Pi cannot handle long-term data storage for results. This has unforunately corrupted most of my long-term data. For context - I had roughly 30 million records around 2.2gb on a Raspberry Pi.
For a new long-term data solution I've designed an extraction mechanism that APIs out logged queries to a server to collect them. This basically works in three parts:
- Bash script that queries local sqlite file to extract 500 chunk size queries, as well as including clients to properly identify them
- Bash script remembers its place by using the sqlite database id on a successful response from server
- Server collects information in a pretty optimized relational database
- Server has a help command -
php artisan stats:dump
that helps write this post
So now I can set my Pi-hole to purge data older than a month. Data is constantly extracted so I'm losing no long term data and when the file grows to a few gigabytes - I pull it locally to a powerful machine and vacuum it. This frees the space from the deleted rows and buys me another 10-14 months. Hopefully this will be moot when I upgrade my Raspberry Pi to a 32gb storage device.
Though this blog isn't about that project - this is about Pi-hole which is now on version 5.5! This is quite the jump from my last blog about Pi-hole at 4.3, so lets recap those releases:
- 5.0 - Major Release - Per Client Blocking, Groups, Deep CNAME, Local DNS records and more
- 5.1 - Dark Mode, Conditional Forwarding, Debug System
- 5.2 - New RegEx Engine, ECS Support, Identify clients via MAC
- 5.3 - Fixes on Fixes, Continued detailed query type expansion (SVCB/HTTPS)
- 5.4 - Security and Reliability
- 5.5 - AdList Management, New Dark Theme, Automated Blocking Mode
From the dashboard above you can see quite a few UI elements have changed and my daily network behavior with this remote work makes it quite obvious when I sleep.
So let's just jump into the analytics and see what is going on. Right out of the gate, it appears the database is now 2.9gb holding 670 days of records so I have quite a lot of records. Lets start with the top 15 blocked and allowed domains.
Top 15 Blocked
Domain | Count |
---|---|
806c4c48-1715-4220-054f-909f83563938.local | 803,900 |
e7bf16b0-65ae-2f4e-0a6a-bcbe7b543c73.local | 638,461 |
1d95ffae-4388-9fbc-1646-b2b637cecb64.local | 432,009 |
ssl.google-analytics.com | 323,265 |
68c40e5d-4310-def5-a1c3-20640e1cd583.local | 247,893 |
watson.telemetry.microsoft.com | 189,136 |
app-measurement.com | 129,760 |
settings-win.data.microsoft.com | 71,150 |
googleads.g.doubleclick.net | 71,007 |
www.googleadservices.com | 44,161 |
mobile.pipe.aria.microsoft.com | 41,748 |
reports.crashlytics.com | 39,792 |
v10.events.data.microsoft.com | 31,839 |
vortex.data.microsoft.com | 31,554 |
sb.scorecardresearch.com | 28,467 |
Top 15 Allowed
Domain | Count |
---|---|
e7bf16b0-65ae-2f4e-0a6a-bcbe7b543c73.local | 5,631,936 |
68c40e5d-4310-def5-a1c3-20640e1cd583.local | 5,305,149 |
1d95ffae-4388-9fbc-1646-b2b637cecb64.local | 4,898,204 |
localhost | 1,659,895 |
806c4c48-1715-4220-054f-909f83563938.local | 1,342,386 |
b.canaryis.com | 680,590 |
clients4.google.com | 221,816 |
ssl.gstatic.com | 204,572 |
play.google.com | 202,694 |
cdn-0.nflximg.com | 181,530 |
api-global.netflix.com | 148,800 |
wpad.local | 133,864 |
api-0.core.keybaseapi.com | 129,141 |
nrdp.prod.ftl.netflix.com | 128,650 |
pistats.ibotpeaches.com | 127,896 |
The first question is obviously what is the point of all these uuid like domains with a .local
TLD. I only resolved this issue after I noticed the huge amount of domain resolutions that were out of the norm.
Turns out my work laptop loves using Multicast DNS to identify devices on the network. For some reason any device that announces itself with a .local
domain it actively reaches out for establishing communication. What truly upsets me is these domains combined make up roughly 17 million requests. My database only has 30 million.
Why wouldn't Apple just stop reaching out to a host if it recognized it hasn't responded in a known way in the last 10 requests or 100 requests or even million requests? Turns out bitching about it won't solve the problem - the fix is simple.
Just head to the settings and click DHCP - you'll want to change the domain name (pictured above) off of the default local. A reboot of the network on any affected MAC device as well as the Pi-hole and those requests go quiet.
So I took a quick trip to RawGraphs and tried to visualize my network requests using a beeswarm plot to see the overhead all these Multicast requests did.
Those graphs show the huge influx of requests starting with November of 2020 which was the release of Big Sur in the MacOS world. I should have noticed earlier, but this Pi-hole isn't logged or monitored like a production web-server - I only noticed when my disc space warning went off.
Though what about those other blocked requests? Most are obvious being either Microsoft or Google services, but what is the last one - scorecardresearch.com
?
A quick Google search says once again another analytic tracker - heavily used on websites, TV apps and more.
Taking a look at the allowed requests is nothing out of the norm
- Google maps - Ingress, Nest and more
- Canary - Security system
- Netflix - Pandemic times - lots of Netflix
- Keybase - Encrypted communication
- wpad.local - Talked about in last Pi-hole blog - tis okay.
- PiStats - The project that collects long-term data for me
So then to end this post I wanted to take a look at my top 5 devices and their query counts:
- Macbook Pro - 19,229,727 requests
- Xbox One - 3,809,801 requests
- Router - 828,274 requests
- Raspberry Pi - 778,291 requests
- Sony Smart TV - 763,334 requests
So I can have another chance to say its insane that my Macbook and its busted behavior for Multicast DNS spams more requests than all my other devices combined!
I'm now debating purging these 17 million spam requests from my system so I can have less overhead when running analytics. We will see on the next Pi-hole blog what I decide to do. Till then, pass on a donation to the Pi-hole team if you appreciate what they are doing.