Scraping(JS/Source code)
Source Code Recon
JavaScript files are used by modern web applications to provide dynamic content which contains various functions & events. Each website includes JS files and are a great resource for finding those internal subdomains used by the organization.
Tools: π
1) Gospider
Author: Jaeles
Language: Go
Gospider is a fast web spidering tool capable of crawling the whole website within in a short amount of time. This means gospider will visit/scrap each and every URL mentioned in the JS file and source code. So, since source code & JS files make up a website they may contain links to other subdomains too.
Installation:
This is a long process so Brace yourself !!! πͺ
Running:
This process is divided into3β£steps:
1) Web probing subdomains
Since we are crawling a website, gospider excepts us to provide URL's, which means in the form of
http://
https://
So first, we need to web probe all the subdomains we have gathered till now. For this purpose, we will use httpx .
So, lets first web probe the subdomains:
Now, that we have web probed URLs, we can send them for crawling to gospider.
Caution: This generates huge traffic on your target
Flags:
S - Input file
js - Find links in JavaScript files
t - Number of threads (Run sites in parallel) (default 1)
d - depth (3 depth means scrap links from second-level JS files)
sitemap - Try to crawl sitemap.xml
robots - Try to crawl robots.txt
2) Cleaning the output
The parth portion of an URL shouldn't have more than 2048 characters. Since, we gopsider
The Point to note here is we have got URLs from JS files & source code till now. We are only concerned with subdomains. Hence we must just extract subdomains from the Gospider output.
This can be done using Tomnomnom's unfurl tool. It takes a list of URLs as input and extracts the subdomain/domain part from them.
You can install Unfurl using this command go get -u github.com/tomnomnom/unfurl
Break down of the command: a) grep - Extract the links that start with http/https b) sed - Remove " ] " at the end of line c) unfurl - Extract domain/subdomain from the urls d) grep - Only select subdomains of our target e) sort - Avoid duplicates
3) Resolving our target subdomains
Now that we have all the subdomains of our target, it's time to DNS resolve and check for valid subdomains.
( hoping you have seen the previous techniques, and you know how to run puredns)
I love this technique as, it also finds hidden Amazon S3 buckets used by the organization.If such buckets are open and expose sensitive data than its a WIN WIN situation for us. Also the ouput of this can be sent to secretfinder tool, whihc can find hidden secrets,exposed api tokens etc.
Last updated