Whenever you want to look for something on the Internet, you use Google. The giant search engine indexes almost everything on the web. It has made billions of web pages accessible for people to find. And so, by using it, you would have a greater chance of finding whatever you are searching for.
However, within the large sea of indexed web content and public data, pieces of sensitive information can sometimes find themselves landing on search results. And frequently, this happens without their owners realizing it.
A malicious hacker, by performing a technique called Google Dorking (or Google Hacking), can get their hands on this supposedly hidden content.
If you are not familiar with Google Dorks, don’t worry. In this post, I will explain what they are and how you can use them. I will also provide you with examples of how hackers employ them to access sensitive content. And finally, I will share with you a few best practices of how to protect yourself against them.
Before we go any further, I would like to remind you that accessing any information to which we are unauthorized is considered illegal in many jurisdictions. The primary focus of this article is to help you identify and clear any leaking information you might have. It also aims to assist you through the reconnaissance phase of your pen-testing projects. I do not encourage any other malicious use.
What are Google Dorks?
Google Dorks are search queries specially crafted by hackers to retrieve sensitive information that is not readily available to the average user. The technique of searching using these search strings is called Google Dorking, or Google Hacking.
The Google search box can act similarly to a command-line or an interpreter when provided with the right queries. In other words, there are certain keywords, and operators, that have special meaning to Google.
Users can employ these operators to help them find relevant results to their search queries in a short amount of time.
On the other hand, hackers can also take advantage of these operators to retrieve files containing passwords, lists of emails, log files, and many more.
The following example is a google dork query that returns log files containing passwords with email addresses:
filetype:log intext:password intext:(@gmail.com | @yahoo.com | @hotmail.com)
By the end of this article, you will be able to write similar queries.
Commands and Operators
Operators are the building blocks of Google dorks. Therefore, we will address them here first before we can write full dork queries.
Here is a list of the most common operators that you need to know:
If you use the operator OR (or |) between two keywords or more, then the search results will return pages that contain matches to at least one of the keywords.
google OR bing OR duckduckgo
Matching All Keywords
Using the operator AND between two keywords or more forces the search engine to return results relevant to all provided keywords.
Samsung AND Apple
An Exact Match
Enclosing the search terms in double-quotes (“search string”) returns only webpages that contain an exact match of the string.
For example, if you search for the following :
"Google Dorks Explained"
Only pages that contain that same string will be returned. And so, pages that contain “Explained Google Dorks“, or “Google Hacking using dorks explained” will not be matched.
Searching on a Specific Site
The operator “site: ” limits the search to the specified website.
This query will only return web pages from Wikipedia that are relevant to the keyword Linux.
Excluding a Keyword
If you use the operator ‘–‘ followed by a keyword, then this keyword is excluded from the results.
If we apply this operator to the previous example, then we will have the complete opposite results.
The above query will exclude the Wikipedia site from the results.
The asterisk operator ‘*‘ is used as a wildcard and can match any word or group of words. This operator can be very useful when combined with the double quotes operator.
"username * password"
This example returns pages that contain the word username, followed by a group of words, which are then followed by the word password.
The real power of google operators arises from how you can combine them to form complex queries. In such cases, the use of brackets is necessary to determine which operator has the highest priority.
If you remember some basics from your math class, then you won’t have a problem understanding the following example:
"google (dorks OR dorking OR hacking)" AND (explained OR tutorial OR guide)
Keywords in URLs
If you want Google to show only pages containing the search terms in their URL, then you can use the operator “inurl:“
For instance, the following query will return any page that contain the word admin in its url:
Although this query on its own might return millions of pages — most of which are irrelevant — you can still filter out the results by using additional commands. For instance, if you limit the search to your website, you can verify if you have an exposed admin folder that you should worry about.
Keywords on the page
The command “intext:” returns pages containing the search term in their content.
Keywords in the title
The command “intitle:” returns pages that contain the terms of the search in their title, not their content.
When using the command “filetype:“, you force Google to only return pages that have a certain extension.
In the example below, Google will return only PDF files that contain the words “budget report”.
"Budget report" filetype:pdf
Search in Cache
Google stores a copy of almost every page it visits. These copies can sometimes come in handy, especially if the original web page is no longer available or is too slow to respond.
If you want to search in Google’s cache for a previous version of a page, you can use the command “cache:“
Examples of Google Dorks
If you’ve reached this far, then you should by now have all the building blocks that you would need to create complex queries.
To use Google dorks, all you need are the operators and commands we’ve seen so far and creative thinking to combine them in new ways.
But most of the time, you won’t even have to do that. You can simply take advantage of the Google Hacking Database (GHDB).
GHDB is an open-source project that provides an index of all known dorks. The project started in 2002 and is currently maintained by Exploit-DB.
You can use these freely available dorks when testing the security of your website or for pen-testing purposes.
In order to give you an idea of what you can access using dorks, I have compiled below some examples taken from the GHDB.
The following query reveals live feeds from AXIS cameras.
intitle:"Live View / - AXIS" | inurl:/mjpg/video.mjpg?timestamp
The next query returns email lists contained in Excel files.
As we’ve seen earlier in this post, this query returns log files containing passwords and their corresponding emails.
filetype:log intext:password intext:(@gmail.com | @yahoo.com | @hotmail.com)
Open FTP Servers
This search string reveals open FTP servers that can contain sensitive information.
intext:"index of" inurl:ftp
This query exposes pages that are vulnerable to SQL injection attacks.
inurl:".php?id=" intext:(error AND sql)
The following query returns scanning reports that reveal vulnerabilities in the scanned systems.
intitle:report (nessus | qualys) filetype:pdf
Nessus and Qualys are common vulnerability scanners, and their name is often included in the scan report.
These reports should be confidential because anyone accessing them can easily hack into the system by exploiting these vulnerabilities. This is very critical, and you should, therefore, make sure that you don’t have any reports accessible in the search results.
In this last example, the following query reveals the contents of exposed databases, including usernames and passwords.
intitle:"index of" "dump.sql"
Defend Yourself Against Google Dorks
Now that you know how dangerous Google dorks can be, you’re probably wondering how you can protect yourself, or your website, against them.
First of all, you should put yourself in the position of an attacker and try using google dorks against yourself. If you find something in the search results that shouldn’t be there, then you can fix this problem by following these good practices:
- You can create a file called “robots.txt” in your directory, and specify to search engine robots which directories or files they should not index.
- For sensitive pages, you should include meta tags in your Html code header with Noindex and Nofollow values.
- You should always password-protect your directories.
- Never store a password in plaintext. Instead, use salted hashes.
- Sitedigger is a tool that you can use to help you find vulnerabilities and sensitive data from your site that is exposed through Google results.
Even if you do not have a webserver connected to the Internet, you still might not be as safe from Google Dorking as you might think you are.
You can still find your personal information readily accessible from Google Search.
I invite you to apply what we’ve learned in this post to identify if you have any leaked personal information. And if you find any, you should notify the proper entity so that they can take the necessary steps to remediate that.