Apktool

The human behind the keyboard

8 years ago 14 min read

A random slack message hit my inbox from a co-worker, which led me to this blog. It was a simple link to http://git-awards.com/users/search?login=iBotPeaches with the comment "40th in the US for Java repos. Nice"

GitHub stars are hardly an indication for personal achievement as projects are a community effort, but it was an interesting block of stats to consume regardless.

As this screenshot shows, projects I maintain under my namespace (/iBotPeaches) put me in the top 250 Java developers (In terms of GitHub stars) in the world or number 1 in Tampa, Florida.

So for complete transparency this is because of 1 project - Apktool. This in absolute vague terms reverse engineers Android applications. I cannot call it completely mine, as the original commit and vision came from brut.alll.

For a small history lesson, I was just another user of this tool during the Android Froyo/Gingerbread days. As the rumored next version of Android was approaching, new applications began breaking and I hit my first of multiple bugs in the tool.

This was November 2011 and I reported a few bugs with other random users of the tool. I was an ass in short, my replies didn't do anything but generate notifications.

Any further word on these issues? we still cannot properly decompile and recompile any ICS apks.

To brut.alll - I was just an anxious guy trying to get something that was free to work. I had no idea how it worked or the time spent on it. I knew it worked in the past and now it didn't - I craved success without the effort.

Days became weeks and weeks became months and there was no official fix for the reported issues besides a mess of unofficial ways to work around the problems. This project was open source though, so I decided to take it into my own hands after it was abandoned and attempt to fix some bugs. Here I am 6 years later :)

The Mistakes

The journey to now wasn't easy though. Open source was relatively new back then and I made plenty of mistakes.

I've committed to master with a horribly descriptive message, making searching and locating these changes difficult to find. I've paid the price for this as it's etched into history forever.
I've force pushed over master causing trouble for those who depended on the repository. Thankfully, GitHub and other services can prevent this now.
I've released a version forgetting to proguard the final jar file creating a bloated large file size, now the build process will automatically do this on a release build.
I've released a version without doing a sanity check on Windows, requiring a day0 patch release. I've now written an internal but public release guide that I follow on every release.
I've snapped and lost my cool in the bug tracker, so I stay away from the project when I'm feeling upset or angry.
I've merged code that simply fixed a problem without investigating the potential side effects it could cause.
I've built native portions of the code base on Windows, Mac and Unix just to forget months later how I did it, once again resolved by internal notes.

The thing is, I'm well aware I've made these mistakes despite other negative claims on the internet. Part of being an open source maintainer is identifying where you can improve to make your project(s) better.

One thing that will never stop is the bug reports or as I categorize them - support requests. When a project is young, usually as the maintainer, you are reaching into every corner of the web spreading the news of your project - helping out in forums, StackOverflow and more as needed. As your project grows, the time needed for support soon outweighs the time to spend on the actual development of the project.

The community steps in and helps you (hopefully). Users exchange feedback with others and solve problems on their own. It may just be configuration or a misunderstanding of a feature so you as a maintainer are grateful that other people are helping you.

As time goes on, you take advantage of blips of time in your life. Sitting in the airport with a tablet? Perfect, go check up on the feedback of the last release of your project. I was surprised to count how many of those actually existed.

An XDA-Developers project thread
A retired Google Groups (mailing list)
An irc channel (Freenode #apktool)
A Gitter channel (Integrated GitHub chat)
StackOverflow tagged questions
Tweets
Private communication (Email, instant messenger)

Sometimes I triage easy tickets that don't require my computer to investigate. This keeps things in a maintainable state so when I return to my development environment I can grab tickets that I know require some intervention.

Say for example you arrive home from work, it's already dark outside and you are exhausted, but your significant other is working late so you have some free time. You are tempted to grab that Xbox controller or flip on Netflix and relax on the couch. However, you know they're people depending on you so you sit down at the computer and decide to work on your hobby project(s).

Where do you start? You have a digital line of things waiting to be done. You have confirmed problems awaiting fixes, you have confirmed problems with no idea of the cause and finally you have bug reports you haven't even looked at yet. What gets priority? Do you ignore all tickets and solely deal with the tickets in your milestone?

The Bug Tracker

You decide to sit down and triage the new tickets. You aren't in the mood to get knee deep in the internals of Android applications and just want to clean up the bug tracker.

You start with the first ticket, the title is vague and doesn't make much sense. You open the ticket and the issue template you wrote to collect common questions is ignored. Instead it's replaced with some form of "it doesn't work". This report by ignoring the requested questions leaves you without enough information. You don't have time for this nor the time to investigate what this person is asking about. You are fairly sure its a configuration problem from the lack of information provided so you close it, linking to StackOverflow or the forums.

The next ticket is an error you recognize. It was caused by an old version, out of date dependency or something. You know there is a ticket somewhere with the initial report of this issue. You search your own bug tracker quickly and find the ticket. You wrote about the cause and solution - perfect! You close that ticket as a duplicate linking to a previously solved issue asking the submitter to upgrade.

Next you find the same issue that's been reported almost 20 times. Someone didn't run the upgrade notes from version x to y and the project isn't smart enough to automatically execute these steps. At this point you are exhausted of finding the original duplicate ticket and just close it citing "please run the command - foo bar from the release notes".

You are on a roll closing tickets, but the next ticket you see catches you off guard. It's perfect in every sense. The issue template is entered, code blocks are formatted to retain formatting and the example application is attached. You run through the reproduction steps and sure enough duplicate the problem. You are excited someone has met you halfway and mark the bug as confirmed. However, you feel bad in the back of your head as someone did just that a month ago and their report is still confirmed and unfixed.

You press on to the next ticket trying to get that unconfirmed issue count to zero. This ticket is about 70% on the completion scale. They left the example error, but remaking example test applications to reproduce problems you see in a log is next to impossible. You need example applications to make any progress. You make a quick response asking for more details and tag as "Waiting for reporter".

You are down to two unconfirmed tickets and excited to be nearing the end. The 2nd to last ticket has such a strange error. Somehow the tool (your project) worked absolutely fine in execution without errors, but the host device has some cryptic error about failing to load the application. You start investigating and get nervous. Nothing matches what you expect, this is some intentional written anti-tampering code. You decide the scope of your tool isn't to live patch around these enhancements so you acknowledge the problem but the resolution is "no fix". These changes are frequently built by businesses and you as one individual have no time to even begin thinking how to solve it.

Your brain is a bit fried after spending time researching a ticket solely to triage it, but here you are at the last unconfirmed ticket. You open it up and it's disgusting to read. Someone is insulting the free work you do, demanding a fix for an issue they encountered while using it. You are aware of the problem, but their tone is so full of rage you fire back with a similar message and close the ticket.

You think you are done and head back to the bug tracker index page to grab a confirmed bug to work on. Unbeknownst to you, in the hour it took you to triage a few tickets there has already been a few responses. Users around the world got your email notifications of response(s) and are quickly responding.

These responses range from a variety of tones. Some users are very happy you took the time to respond and provide the requested information or asset. A few users have a language disconnect and fire back responses requesting the exact commands to run to fix their problems we closed earlier. A couple of people circumvent the public bug tracker and fire responses directly to you whether it's in chat, email or elsewhere. You haven't even looked at your debugger or IDE yet. It's time to ignore these and work on the project.

The Development

You sort by confirmed bugs and a startling 50 issues are waiting for you. Some of the reports you recognize and others are a black box. You decide to sort by stars/responses and find a bug with 35 comments. Obviously people want this report fixed but you hardly remember it. You sit down and read through the comments and understand the problem at hand. You start debugging and experimenting with fixes. Remember, we are building a reverse engineer tool here sometimes just understanding the problem is half the battle. You make some code changes and have some work in progress, but 3 hours have passed since you sat down at the computer and your significant other just arrived at home. You decide to stash those changes and write a little note on the ticket to remind you when you have time.

3 hours passed on a work day and all you did was basically clean up the bug tracker and start investigating 1 ticket. You feel like the next chance you have time you are just going to jump straight to that ticket and ignore everything else. You feel close to fixing that last issue so you squeeze in some time in the morning before work. Your changes are stable enough that tests are passing and the problem is fixed. You push it online for yourself to review. There is no one else on the project team, but following the same procedure you set for others keeps things organized.

A few brave souls take that code you pushed online and build it themselves. They are anxious to have a fixed problem and report back to you that the problem is solved. You won't have time tonight because you have another engagement, but someone else confirmed the ticket so you just merge it into master without writing a regression test. This will probably come back to bite you, but the community wants this fix and fast.

The Release

You look at the milestone on GitHub, you are 8 days late. Somehow the 3 month sprint you set up has already passed. It's amazing how fast 3 months sneak up on you. You try and promise a release every 3 months, so you set aside time on the weekend to slice a new release. This turns out to take most of the day. Automation handles most of it, but release notes need to be written and posts need to be made while binaries tested again for sanity on all platforms. After this is all done and you've spread the news over social media you breath a sigh of relief.

However, things are about to get busy. No matter the release, regardless of the release notes, users will attempt to try things that are still broken. They will let you know in the comments, related bug report, forums, etc that version x.x.y has not fixed issue ###. You know this, but the community is only reminding you. Others are a bit more aggressive asking "when will x get fixed?" or "Why did y get fixed before z?". You decide to take a break for a bit, but you haven't checked the bug tracker since you've been ignoring it to actually work on code changes.

The unconfirmed bugs count is alarming. Barely 2 weeks has passed and you have 20+ tickets waiting for you. The best course of action is to never stop working on them. The bug tracker is out of control and you try and triage a few issues a day. Slowly but surely you will catch back up and be ready to work on the next ticket.

The Behind the Scenes

Behind every bug report, forum post or question there is an additional subset of private messages whether from email or another medium. Some users believe their best approach to solve their problem is a direct one on one email to ask you about x, y or z. Some of these users carry an interesting @domain email so you wonder the use case of the project in their industry. You direct these users to the public counterparts of their problem, because knowledge in private does not help the community. You think about charging for private support to make the emails worth it, but the legality of such a thing confuses you.

Unrelated to the project itself, you notice your hard drive making some interesting sounds and your computer refuses to boot. You spend days and come to the conclusion that the drive is broken. Your code is on revision control, so nothing is lost. However, the /Downloads/Apktool folder you have organized by ticket IDs is gone. You didn't back that up online because its size at this point is 500gb. You thought you could live without that folder, but links in tickets are dead and some applications are not possible to obtain again. You've learned your mistake and begin tar-snapping those tickets into online backups.

You spot another email in your inbox, but this time it's a pull request instead of an issue report! Some brave soul has jumped into the source code of your project and proposed a solution. These are absolutely amazing regardless of whether they will be able to be merged. You thank the user and look into the change. Sometimes these can be merged, sometimes not. You try and thank the user at all costs because you appreciate the time they spent in your project.

The Ugly

You have another interesting email in your inbox in the morning from a competing revision control service. They have automatically created a repository for you and imported your repository metadata from GitHub to "hold the name" for you. They are asking you to switch to their service with a plethora of reasons why. Your mind is absolutely blown that someone would do such a thing of partially signing you up already. You immediately decline and ask the company why they would do such a thing and demand your project be removed.

You don't think things can get much worse, but you have a new bug report saying that the final .jar file doesn't open. The user got a development build from some automatic service that parsed GitHub repositories looking for Java Gradle projects. Somehow this service has over a hundred downloads for your project. People must be anxious to get development versions and they are willing to use a service that produces an invalid jar. This service has now cost you a small amount of free time that you would have usually dedicated for the project. You politely email them explaining the situation and ask them to block your project from appearing on it again.

Another interesting problem hits your inbox. Your host service for binaries - BitBucket has somehow broken and serving 0kb files for a few of your binaries. You could re-upload them, but that would break the "uploaded date" order and make it confusing to new users. You email them hoping they can resolve this, but things don't look good.

We recognize the trust you place in us with your data and take that responsibility seriously. We have measures in place to protect the integrity of your data, but those failed us in this instance. We sincerely apologize, and are improving our operational procedures in order to prevent something like this from ever happening again.

You are upset, but things live on. You re-upload 3 binaries and watch their download counts of 150k each be reset to 0. This gave you a learning experience to mirror releases onto your own private mirror and back to GitHub now that they support it. You could effectively drop BitBucket now, but change confuses people since they have been your source of downloads since Google Code was abandoned so you decide to stick with them.

You've been struck with some serious bad luck with issues outside your control, but then you get hit again with a strange problem. There is a new file being hosted by your BitBucket project that you didn't upload. You panic and double check approved users, but its just you. You reset your password despite having 2FA, but it turns out your repository was the target of a vulnerability against BitBucket internally known as BBCDEV-4320 (SEC-1136). You begin downloading every release and checking the hash scared of something more at hand. Every thing else seems to be fine, but after two strikes you begin to really doubt hosting binaries at BitBucket.

The Assholes

Despite problems outside your control plaguing you, they won't let up. There still is a subset of users intentionally out there trying to take what you have. Password reset emails become all too common and "failed" login attempt emails plague related services. You are hosting a free open source project, you don't understand the hate targeted towards you. Some individuals even register social media names and domains related to your project and hold them hostage demanding money for them. You decide to hold your ground and stay hosted on GitHub Pages with a github.io domain.

There is another type of asshole who creates a public facing website for your tool. They blatantly copy & paste the contents and host copies of your downloads. You can't confirm the integrity of the binaries so naturally dislike the site. Their website is plastered with 5+ ads per page which means the user is only trying to profit from your open source project. It's frankly disgusting and insulting that this page is getting any traffic at all. You don't remember how you stumbled upon it, but you send a Google Webmaster website report and move on.

The Good

Despite all the time you spend, issues you encounter and stress you are caused. There is some good that comes out of it. You are recognized in communities, invited to speak at conferences and people recognize you in the field. It's those simple and fun reminders that working in open source has its benefits.

Doors in your life could be opened by working or helping in the open source space, but as your project(s) grow - the hobby you once loved becomes more and more of a daunting task as some people forget a human is sitting behind the screen of every project.