looks for unregistered b32's by pulling them from the netdb and requesting http headers from them.
 
 
 
Go to file
simp 6451721365 don't use a keyboard mash secret key 2025-07-16 04:48:27 +00:00
static fixes to scanner.py, added viewer and rechecker scripts. experimental, needs work 2025-07-16 04:40:28 +00:00
templates fixes to scanner.py, added viewer and rechecker scripts. experimental, needs work 2025-07-16 04:40:28 +00:00
.gitignore Initial commit 2025-03-15 09:57:50 -04:00
LICENSE Initial commit 2025-03-15 09:57:50 -04:00
README.md fixes to scanner.py, added viewer and rechecker scripts. experimental, needs work 2025-07-16 04:40:28 +00:00
categories.json fixes to scanner.py, added viewer and rechecker scripts. experimental, needs work 2025-07-16 04:40:28 +00:00
data.json fixes to scanner.py, added viewer and rechecker scripts. experimental, needs work 2025-07-16 04:40:28 +00:00
online.json fixes to scanner.py, added viewer and rechecker scripts. experimental, needs work 2025-07-16 04:40:28 +00:00
recheck.py don't use a keyboard mash secret key 2025-07-16 04:48:27 +00:00
rechecked.json fixes to scanner.py, added viewer and rechecker scripts. experimental, needs work 2025-07-16 04:40:28 +00:00
requirements.txt initial commit 2025-03-15 10:12:22 -04:00
scanner.py fixes to scanner.py, added viewer and rechecker scripts. experimental, needs work 2025-07-16 04:40:28 +00:00
script_settings.ini fixes to scanner.py, added viewer and rechecker scripts. experimental, needs work 2025-07-16 04:40:28 +00:00
viewer.py don't use a keyboard mash secret key 2025-07-16 04:48:27 +00:00

README.md

HIGHLY EXPERIMENTAL, BUGS

I'm sharing parts of what i have now for a full http-only scanner with included webapp for organizing your finds.

scanner.py is the most important, runs single threaded, can pull from several floodfills and distribute scraping over several proxies, uses a batching system. This is well tested and has ran for months at a time. Gives you online.json.

recheck.py is for going back and checking existing finds, getting some additional info. this needs a lot of work done to it, was in process of migrating things, so it's unfinished but it should work as is. After scanner.py finds things, you need to run recheck.py to get rechecked.json, which viewer.py uses to browse.

viewer.py is a simple webapp for viewing cached html and saving sites to categories. it automatically places some things under tabs like zzzot, torrent clients and default webservers. when running it also shows the status of scanner.py, but both can be ran independently.

B32Scanner

Looks for unregistered b32's by pulling them from the netdb and requesting http headers from them. Can pull b32's from several routers and distribute request load among many http proxy tunnels using aiohttp.

Only tried with i2p+. Won't get leasesets from i2pd now. Untested on vanilla i2p.

Install

git clone --config http.proxy=127.0.0.1:4444 http://git.simp.i2p/simp/b32scanner.git && cd b32scanner && python3 -m venv ./ && cd bin && source activate && cd .. && pip3 install -r requirements.txt

Run

Activate the venv and do

python3 scanner.py

This is meant to be ran in the background over a long period of time, periodically checking netdb for new entries and sending http requests. What it's doing shows in the log.

What it does

  1. Checks netdb from router(s), adds any new published b32's to "data.json". Adds signature and if ecies/elgamal for each entry.
  2. Using the i2p http tunnels you supply it, it will round-robin requests for b32 it hasn't tried through them. It will attempt again if there's a timeout, adjusted in script_settings.ini
  3. Returns ping, headers, last seen, and response code for each b32. Online destinations are put in "online.json"

Dealing with large request loads

"crawl_timer" adjusts the time to send out requests, could probably be spaced out even longer and set "Reduce tunnel quantity when idle" and "Close tunnel after specified idle period" so tunnels close when not in use.

aiohttp/asyncio is used, so it's capable of handling a large load of requests if distributed over several tunnels and/or routers. It's going to be waiting on i2p. After running for a week or so things start to level off, and you can often get by with less tunnels.

batching was used to reduce the likelyhood of java routers giving proxy errors, which it treats as a timeout. It could request everything at once, but you will timeout and miss things. If you keep it to a reasonable number (like 30-50 batch size per router used for scraping) it shouldn't be timing out.

Settings

[settings]
routers = 127.0.0.1:7661, 127.0.0.1:7662, 	127.0.0.1:7663, 127.0.0.1:7664 #i2p routers to use (only tested with i2p+)
i2p_proxies = 127.0.0.1:4441, 127.0.0.1:4442, 127.0.0.1:4443, 127.0.0.1:4444 #i2p http tunnels to use, list
accepted_responses = 200, 202, 501, 429, 400, 401, 402, 403, 404 #http reponse codes to accept
check_leasesets_timer = 600 #time in seconds to check for new leasesets
crawl_timer = 1200 #time in seconds to request all new b32's
retry_attempts = 3 #number of timesto rety if there's a timeout
timeout_base = 35 #base number of seconds to timeout, adjusts for number of sites requested at once
group_size = 70 #number of sites to request at once