gitGraber: A tool to monitor GitHub in real-time to find sensitive data

October 8, 2019

As technology moves forward, so do the threats to the tools we use every day. GitHub is one such tool, enabling software developers to collaborate within and across organisations. One way of keeping tabs on GitHub is gitGraber which detects sensitive data available on the platform.

Authors: ADRIEN JEANNEAU and RÉMY MAROT

Nowadays, the security tactics of companies have evolved with emerging technologies such as Cloud and the DevOps culture, moving from perimeter security of private infrastructures to a model focused on the data and identity management. With the usage of cloud-hosted machines or SaaS applications, it becomes more than critical to ensure that the data hosted and managed by the company or its contractors are appropriately secured and monitored.

Among the services used by companies, GitHub is one of the most famous as it allows a quick and effective way for teams to work together on software, allowing developers to host their projects.

The drawback of such success is that companies do not necessarily own the right tools to technically check that they are not exposed. Exposure can come from an employee sending code with secrets inside by mistake, an employee about to leave and trying to get his revenge by leaking data or a contractor without any security concern and pushing code by convenience and to having a kind of backup on GitHub.

gitGraber: Origins

We originally developed gitGraber in the context of a Bug Bounty. However, with time and improvements to the code, gitGraber has become a real assistant for Github monitoring based on specific keywords (e.g. company brand name). Moreover, gitGraber recognises common patterns, thus helping identify code that contains this type of data and assisting SOC teams to react and remediate quickly when an issue is detected.

The story behind the tool began in 2019. A friend (and bug hunter) received an invitation to participate in a private and not-so-typical Bug Bounty program. The goal was to find specific and valid API keys for the target online service. For each of these keys that were not previously reported, the hunter receives a bounty.

By definition, every hunter is aware that GitHub is a gold mine to find sensitive data. From this idea, we tried to imagine what was the best way to find these API keys and how to be the first to submit it to the program. Many excellent tools already exist to search on GitHub; these tools are, however, designed to search for specific items (e.g. ‘repository,’ ‘user’) or the history. However, this was not compliant with our need to be fast and accurate; this how we decided to start this project.

Security-enhancing features from the start

As soon as we started, we already had a roadmap with essential features to develop:

  • Define the best patterns that allow the matching of sensitive data (like PRIVATE_KEY or SECRET_KEY);
  • Create a regular expression to find these keys in the different public repositories;
  • Automate the whole process, from finding the keys to programme reporting;
  • Check each set of keys to see if they have been revoked;
  • Avoid sending reports in case our tool had already scanned the repository and previously reported the keys.

The starting point was to understand the API of GitHub and how we could optimise its usage. From the documentation, we found that the GitHub API allows to directly check last indexed commit files, which was the perfect thing for our project.

We spent many hours to develop the tool and ensure that it was working correctly. In the end, we succeeded, and the automation flow was working as expected. Indeed, our tool was able to find the pairs of keys, validate them and create a report with the URL of the concerned GitHub repository.

A warm welcome by the infosec community

It was awe-inspiring. Using gitGraber allowed for some of our reports to be submitted on the Bug Bounty platform only 30 seconds after GitHub indexed the commit. It was an excellent experience (shared with Nico and Gwen, thanks guys).

Seven months later, I decided to go back on my tool and to enhance it with other sensitive API keys of some of the most used providers (like AWS, Google, Slack, etc.). In the beginning, the tool worked with a specific Bug Bounty programme, so it was the moment to create a real standalone tool—gitGraber was born.

Remy (alias reptou) is a friend and hunter with whom I collaborate many times for hunting. Collaboration is always fun in Bug Bounty, and you can be sure that you always learn something from others. After discussing with Remy about recognition, I talked to him about gitGraber. We agreed to work together to improve and enhance gitGraber to have the best tool to monitor GitHub in real-time (with specific features for bug hunters).

We tested all the services with our accounts to ensure that all the regular expressions were valid, thus avoiding as much as possible false positives and obtaining a high level of accuracy. This effort was difficult as GitHub hosts and indexes millions of repositories, and the tool would have been useless if the notifications excellent each time a sensitive word is detected.

Join us: https://github.com/hisxo/gitGraber/

Today, gitGraber can find more than 31 types of keys, and we used the program to do responsible disclosures of sensitive data leaks to major companies in the world (we do continuous monitoring on our side). The project has been open-sourced, and all contributions are welcome, be it because you have encountered any bugs or would like to add some exciting features to make it even more accurate and robust.