gitGraber :
A tool to monitor GitHub in real-time to find sensitive data

As technology moves forward, so are the threats to the tools we use every day. GitHub is one such tool, enabling software developers to collaborate within and across organisations. One way of keeping tabs of GitHub is gitGraber which detects sensitive data available on the platform.

Authors:

  

Nowadays, the security tactics of companies have evolved with the emerging technologies such as Cloud and the DevOps culture, moving from perimeter security of private infrastructures to a model focused on the data and identity management.
With the usage of cloud-hosted machines or SaaS applications, it becomes more than critical to ensure that the data hosted and managed by the company or its contractors are appropriately secured and monitored. 

Among the services used by companies, GitHub is one of the most famous as it allows a quick and effective way for teams to work together on software, allowing developers to host their projects.


The drawback of this success is that companies do not necessarily own the right tools to technically check that they are not exposed by an employee making a mistake and sending code with secrets inside, an employee about to leave and trying to get his revenge by leaking data or a contractor without any security concern and pushing his code by convenience and to having a kind of backup on GitHub.


gitGraber: origins

We originally developed gitGraber in the context of a Bug Bounty. However, with time and the improvements done on the code, gitGraber has become a real assistant for Github monitoring based on specific keywords (e.g. company brand name). Moreover, gitGraber recognises common patterns, thus helping identify code that contains this type of data and assisting SOC teams to react and remediate quickly when an issue is detected.

The story behind the tool began in 2019. A friend (and bug hunter) received an invitation to participate in a private and not-so-typical Bug Bounty program. The goal was to find specific and valid API keys for the target online service. For each of these keys that is was not previously reported, the hunter receives a bounty.

By definition, every hunter is aware that GitHub is a gold mine to find sensitive data. From this idea, we tried to imagine what was the best way to find these API keys and how to be the first to submit it to the program. Many excellent tools already exist to search on GitHub; these tools are, however, designed to search for specific items (e.g. ‘repository,’ ‘user’) or the history. However, this was not compliant with our need to be fast and accurate; this how we decided to start this project.

Security-enhancing features from the start

As soon as we started, we already had a roadmap with essential features to develop:

  • Define the best patterns that allow the matching of sensitive data ( like PRIVATE_KEY or SECRET_KEY );
  • Create a regular expression to find these keys in the different public repositories;
  • Automate the whole process, from the finding of the keys to the reporting the program; 
  • Check each set of keys to see if they have been revoked or not;
  • Avoid sending reports in case our tool had already scanned the repository and reported the keys already.

The starting point was to understand the API of GitHub and how we could optimise its usage. From the documentation, we found that GitHub API allows directly to check last commits files which are indexed, which was the perfect thing for our project.

We spent many hours to develop the tool, and ensure that it was working correctly, but, in the end, we succeeded, and the automation flow was working as expected. Indeed, our tool was able to find the pairs of keys, validate them and create a report with the URL of GitHub repository.

A warm welcome by the infosec community

It was awe-inspiring. By using our tool, some of our reports were submitted on the Bug Bounty platform only 30 seconds after that the commit was indexed by GitHub and detected by our tool. It was an excellent experience (shared with Nico and Gwen, thanks guys).

Seven months later, I decided to go back on my tool and to enhance it with other sensitive API keys of some of the most used providers (like AWS, Google, Slack, etc.). GitGraber was initially designed to work with the specific Bug Bounty program, so it was the moment to create a real standalone tool and gitGraber was born.

Remy (alias reptou) is a friend and hunter with whom I collaborate many times for hunting. Collaboration is always fun in Bug Bounty, and you can be sure that you always learn something from others.

After discussing with Remy about recognition, I talked to him about gitGraberGitHub indexed the commit and we agreed to work together to improve and enhance gitGraber to have the best tool to monitor GitHub in real-time (with specific features for bug hunters).

We tested all the services with our accounts to ensure that all the regular expressions were valid, thus avoiding as much as possible false positives and obtaining a high level of accuracy. This effort was a difficult job, because Github hosts and indexes millions of repositories, and the tool would have been useless if the notifications excellent each time a sensitive word is detected.

Link: https://github.com/hisxo/gitGraber/


Today, gitGraber can find more than 31 types of keys, and we used the program to do responsible disclosures of sensitive data leaks to major companies in the world (we do continuous monitoring on our side). The project has been open-sourced, and all contributions are welcome, be it because you have encountered any bugs or would like to add some exciting features to make it even more accurate and robust.