This module equips learners with essential web reconnaissance skills, crucial for ethical hacking and penetration testing. It explores both active and passive techniques, including DNS enumeration, web crawling, analysis of web archives and HTTP headers, and fingerprinting web technologies.
Introduction

- The primary goals of web recon:
- Identifying assets
- Discovering hidden information
- Analyzing the attack surface
- Gathering intelligence
Types of Reconnissance
Active Recon
- In active recon, the attacker directly interacts with the target system to gather info using the following methods:
- Port scanning
- Vulnerability scanning
- Network mapping
- Banner grabbing
- OS Fingerprinting
- Service enumeration
- Web spidering
Passive
- In passive recon, the attacker gathers information about the target without directly interacting with it
- This relies on publicly available information and resources, such as:
- Search engine queries
- WHOIS lookups
- DNS
- Web archive analysis
- Social media anaylsis
- Code repositories
WHOIS
- WHOIS is a protocol used to query databases that store information about registered internet resources like domain names, IP address blocks and autonomous systems
- A WHOIS record contains the following information:
- Domain name
- Registrar
- Registrant contact
- Administrative contact
- Technical contact
- Creation and expiration dates
- Name servers
- Using WHOIS:
1 | |
DNS & Subdomains
DNS
- DNS translates domain names into numerical IP addresses
- The hosts file is used to map hostnames to IP addresses and is located in
C:\Windows\System32\drivers\etc\hostson Windows and in/etc/hostson Unix - In DNS, a zone is distinct part of the domain namespace; for example,
example.com,mail.example.com&blog.example.comall belong to the same DNS zone - A zone file is a text file that resides in the DNS server which defines the resource records within a zone (NS records, MX records, A records, etc.)
Digging DNS
- Most popular DNS recon tools:
dignslookuphostdnsenumfiercednsrecontheHarvesterOnline DNS Lookup Services
- dig:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22dig google.com
# IPv4 address
dig domain.com A
# use specific name server for query
dig @1.1.1.1 domain.com
# show full path of resolution
dig +trace domain.com
# reverse look up
dig -x 192.168.1.1
# short answer to the query
dig +short domain.com
# shows answer section only
dig +noall +answer domain.com
# all avaialable DNS records
dig domain.com ANY
Subdomain Bruteforcing
- There are several tools that excel at bruteforce enumeration:
- dnsenum:
1
2
3# brutefoce subdomains
# -r enables recursive subdomain brute-forcing; meaninig it will enumerate subdomains of a subdomain it found
dnsenum --enum inlanefreight.com -f /usr/share/seclists/Discovery/DNS/subdomains-top1million-20000.txt -r
DNS Zone Transfers
- DNS Zone Transfers is an alternative and less invasive method for uncovering subdomains
- It’s a wholesale copy of all DNS records within a zone from one name server to another
1
dig axfr @nsztm1.digi.ninja zonetransfer.me
Virtual Hosts
- Virtual hosting allows web servers to differentiate between domains, subdomains or separate websites with distinct content
- it allows multiple websites or application to be hosted on a single server
- If a vhost doesn’t have DNS record, it still can be accessed by modifying our
hostsfile to map the domain to an IP address - Virtual host discovery tools:
- gobuster:
1
2
3
4
5
6
7
8# --append-domain is required to append base domain to each word
gobuster vhost -u http://<target_IP_address> -w <wordlist_file> --append-domain
# Section Solutions
# search all files that match lines starting with web (^ = start of line)
# sort -u (-u means unique) to remove duplicates
grep -h ^web /usr/share/wordlists/seclists/Discovery/DNS/* | sort -u > web.txt
gobuster vhost -u http://inlanefreight.htb:30804 -w web.txt --append-domain
Certificate Transparency Logs
- There are two popular options for search CT logs:
crt.shCensys
crt.shalso offers API for automated searches1
2curl -s "https://crt.sh/?q=facebook.com&output=json" | jq -r '.[]
| select(.name_value | contains("dev")) | .name_value' | sort -u
Fingerprinting
- Techniques used for web server and technology fingerprinting:
Banner Grabbing: often reveals server software, version numbers and other detailsAnalyzing HTTP Headers: they typically disclose the web server software;X-Power-Byheader also reveals additional info likes scripting languages or frameworksProbing for Specific Responses: sending specially crafted requests can elicit unique responses that reveal infoAnalyzing Page Content: can reveal clues about the technologies used
- tools that automate the fingerprinting process:
WappalyzerBuiltWithWhatWebNmapNetcraftwafw00f
Web Application Firewalls(WAFs) are security solutions designed to protect web applications from various attacks
1 | |
Crawling
crawlerscan be used to extract valuable information like internal and external links, comments, metadata and sensitive files
robots.txt
- is a text file found in the root directory of a website which contains set of rules for crawlers
- robots.txt can help us undercover hidden directories, map the website’s structure and detect crawler traps
.Well-Known URIs
.well-known, typically accessible via the/.well-known/path on a web server, centralizes a website’s critical metadata, including configuration files and information related to its services, protocols, and security mechanisms..well-knowncan be used to discover endpoints and configuration details- it enables us to comprehensively map out a website’s security landscape
Creepy Crawlies
popular web crawlers:
Burp Suite SpiderOWASP ZAP (Zed Attack Proxy)Scrapy (Python Framework)Apache Nutch (Scalable Crawler)
using scrapy:
1
2
3
4
5
6
7
8
9
10pip3 install scrapy
wget -O ReconSpider.zip https://academy.hackthebox.com/storage/modules/144/ReconSpider.v1.2.zip
unzip ReconSpider.zip
python3 ReconSpider.py http://inlanefreight.com
# using pipx
pipx install scrapy
pipx run --spec scrapy python ReconSpider.py http://inlanefreight.com
Search Engine Discovery
- Search operators can be used to pinpoint specific types of information
- Google Dorking is a technique that leverages search operator to uncover sensitive information, security vulnerabilities or hidden content on websites.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15# Finding Login Pages:
site:example.com inurl:login
site:example.com (inurl:login OR inurl:admin)
# Identifying Exposed Files:
example.com filetype:pdf
site:example.com (filetype:xls OR filetype:docx)
# Uncovering Configuration Files
site:example.com inurl:config.php
site:example.com (ext:conf OR ext:cnf) #(searches for extensions commonly used for configuration files)
# Locating Database Backups
site:example.com inurl:backup
site:example.com filetype:sql
Web Archives
- Internet Archive’s Wayback Machine can be used to revisit snapshots of websites as they appeared at various points in their history.
- It allows us to discover old web pages, directories, files or subdomains that are not currently accessible on current website
Automating Recon
- frameworks that provide a complete suite of tools for web recon:
FinalReconcan be used for tasks like SSL certificate checking, Whois information gathering, header analysis, crawling, and DNS, subdomain and directory enumerations1
2
3
4
5
6
7
8
9# installation
keepalive@htb[/htb]$ git clone https://github.com/thewhiteh4t/FinalRecon.git
keepalive@htb[/htb]$ cd FinalRecon
keepalive@htb[/htb]$ pip3 install -r requirements.txt
keepalive@htb[/htb]$ chmod +x ./finalrecon.py
keepalive@htb[/htb]$ ./finalrecon.py --help
# gather header info and perform a whois lookup
keepalive@htb[/htb]$ ./finalrecon.py --headers --whois --url http://inlanefreight.com