The Internet Never Forgets
While bored one day, I decided to Google myself in quotes like - "Connor Tumbleson" and continue to click/view every link that I did not understand. Page after page continued to find results I was not happy with indexed for anyone else to find.
I spend my fair share of time on the Internet releasing more content under my name that is indexed. Whether that is blogs like this one or some social network public content. However, I'm more interested in the content that I have next to no control over. So this post we will take about those random links I discovered.
Track and field stats from High School, digitally recorded.
- Pro: My track history forever recorded
- Con: My high school exact years for each grade recorded
- Con: I've been out of school for 10+ years. Why is this still online?
The classic white pages digitized instead of the bulky phone book.
- Con: My full name, phone, address for the world to see
- Con: My home history from 13~ to now (27)
- Con: Relatives & Family connections
Team website from an old competitive soccer team
- Pro: During the time - cool to have
- Con: Full name, birthday and address
- Con: Lots of Knowledge Based information
Public information aggregation in a creepy manner
- Con: Full name, age, address
- Con: Contact information
- Con: Location history
- Con: Social profiles
- Con: Court/government records
What I found interesting is that some of the discovered content was entirely a PDF document. Uploaded and stored on some service, probably not intended to be indexed. However, the web has evolved both with the robots that automatically scan it and the technology that can parse text from a PDF.
I even found a tweet from someone I've never met that said - "Connor Tumbleson is on my plane." Creepy, but indexed.
I searched a bit using my history of online aliases and the problem was just as bad. Thankfully there is no easy connection online to connect my long history of usernames, but someone given enough time and energy could probably unearth them. This would bring up some interesting IRC channel communication that for some reason was logged and indexed to the web.
I then switched to social networks and began digging, this time behind the authentication wall of them. Thankfully "yfrog" no longer exists as 50 of my tweets from 2011 included media (from them) that is no longer available. I'm sure some part of the Internet archived that, but I appreciate broken content in this case.
Facebook was full of photos (both in my control and not) that included me in a not preferred forever frozen snippet of time. My only option was to remove the tag on myself, but that photo still exists in the account that holds it. Less easy to find, but with the improvements in automatic face detection - it isn't long till scrapers can automatically find photos of me.
This was 30 minutes of research of indexed content of a nobody. What else exists in the not index-able section of the web? I'm not sure, but as technology enhances and database get larger - the indexed content of individuals will continue to grow.