How I gained access to Amazon EC2 servers from Github Search (adapted)
Github Search allows advanced filters that allow us to search for these private keys @ link.
This looks for:
- private keys with a .pem extension
- “BEGIN RSA PRIVATE KEY” text that marks the beginning of a private key
- sorted by most recently indexed
Although a decent amount of people have dummy keys, a large amount have real pem files. In addition, there are cases in which people notice that they have accidentally pushed a private key and push a new commit stripping their private key. This does not prevent anyone slightly determined from finding your private key, as the original key will still exist in your git history (publicly accessible)
As a proof of concept to warn people of the dangers of accidentally posting sensitive information such as private keys to their github repos, I attempted to gain access to Amazon EC2 servers with a couple of Bash/Python scripts by scraping Github and using Github’s API to obtain the necessary credentials.
I did the following in around 2 hours with very simple techniques to try to find credentials, but I’m certain anyone determined would be able to find quite a lot more credentials and exploitable information with more time and more sophisticated techniques.
To gain access to Amazon EC2 servers, I want to ssh into those servers, by
ssh -i key.pem user@host, where user will be some name like ubuntu or ec2-user, and host will be of the form ec2-xx-xxx-xxx-xx.some-suffix.amazonaws.com, unless the owner is using an elastic ip, in which case the host would just be the ip. However, my technique focused on hosts of the form …amazonaws.com, since it was easier to search for.
My first step was to find potential ec2 hosts by searching for “.amazonaws.com pem” in Github Code, ordered by most recently indexed files (which are more likely to have active servers!) and scraping the first 1000 results. I used a simple shell script:
for i in `seq 1 100`; do curl -sL “https://github.com/search?o=desc&p=”$i”&q=.amazonaws.com+pem&ref=simplesearch&s=indexed&type=Code” | grep “title” >>git_output_ips.txt sleep 5
I sleep 5 seconds between requests to mitigate possible rate limiting. The output, git_output_ips.txt, contains raw html that has been pruned by grep to mostly contain the repo and file that has the possible occurrences of user@host names of ec2 instances. These 1000 most recent occurrences occur between March 6th to July 14th, which is relatively recent, giving us hope that these instances are still running.
My second step was to search through these 1000 repositories for private keys. I ran a simple python script that hit the github api endpoint: “https://api.github.com/search/code” with the query “q=BEGIN%20RSA%20PRIVATE%20KEY+extension:pem+repo:” + repo to search a certain repo for all pem files. An authenticated user can make 20 requests per minute, so this script took around 50 minutes. The script generated a list of vulnerable repositories with ec2 user@host strings as well as only one pem file. To simplify things for me, I excluded repositories with multiple pem files, although a determined hacker would try all combinations of pem files against the host.
At this point, I have pairs of “potential credential” urls that point to github repo files. One file contains the ec2 user@host name, and the other file is a private key. In step 2, I searched through 1000 repositories containing ec2 user@host names for private keys, and found 31 repositories that have exactly one pem file. In step 3, I downloaded the raw source of these github files, which are found by replacing “github.com” with “raw.githubusercontent.com” and removing “blob/” from the original url. For example, here’s a raw js file from the jquery repo: link
To find the ec2 user@host to ssh into, I tokenized the raw file with a space, newline, quotes, and equals sign as delimiters, then found the token(s) that contained “amazonaws”, which were possible user@host strings. I downloaded the pem files and changed their permissions to 400, then tried to ssh into each of these 31 potential credentials I had identified. I did this programmatically by opening up a new tab in terminal and executing the ssh command via osascript since I was using the mac terminal. Here’s an example osascript/ssh command I tried to run for the 31 potential credentials I had obtained:
osascript -e ‘tell application “Terminal” to activate’ -e ‘tell application “System Events” to tell process “Terminal” to keystroke “t” using command down’ -e ‘tell application “Terminal” to do script “ssh -o \”StrictHostKeyChecking no\” -i pem29.pem firstname.lastname@example.org” in tab 2 of window 1′
I used the option -o “StrictHostKeyChecking no” when ssh’ing so I wouldn’t have to answer “yes” to the prompt that shows up when you ssh into a new ip. I then manually checked the 31 tabs that had been created to see how many ec2 servers I had successfully ssh’ed into. I immediately exited from any successful ssh sessions. I gained access to 5 Amazon EC2 servers, which is quite a lot, considering that access to these servers mean that I can do anything on those servers that the users have permissions to do. EC2 servers have a passwordless sudo on creation, which means that most of the servers that you can ssh into will have root access. A malicious hacker could then take over the servers to mine bitcoin, causing the owner to incur heavy AWS charges until he notices. I used very basic techniques in order to find “potential credentials” which gave me a lot of false positives and caused me to miss potential credentials. A determined hacker, for example, would try hosts against all pem files found in a repo, and run searches to find potential user@host names in step 1) in a more meticulous fashion, such as running searches like “compute-1.amazonaws.com” to filter out false positives better, or look for hosts that are elastic ips. In addition, I noticed by manual inspection that programmers would often decouple the user and host names by storing them in variables, so I’m certain more credentials could be found with techniques to try to find the user and host when they are separated in a file. Lastly, as I mentioned before, people will often realize their mistake of committing private keys and add a new commit deleting their key; unfortunately, their private key is still in their history and can be found relatively easily as well.
If you’ve already accidentally committed sensitive information to your private repositories and want to remove them from your history, github has a helpful article that shows you how to do so using git filter-branch or the bfg repo-cleaner.
So, in addition to purging your repository’s history of the offending file, you should make sure to invalidate the compromised data. For example, you should generate a new ssh keypair and delete the public key on your server if you accidentally committed a private key to your public github repo. Better yet, make sure you do not commit sensitive information to public repositories. In the case of pem files, adding a *.pem to your .gitignore file will ensure that git ignores sensitive files when committing, ensuring that the private key will not get pushed to the public for everyone to see.