
(SECURITY) XML External Entity (XXE) vulnerability in CrawlJob XML ...
Feb 20, 2026 · I'll preface this by saying, yes, I've read the security.md, and this seems to fall under the elephant in the room but I thought it was still worth a bug report. XXE to LFI in Heritrix-3.14.0: Impa...
GitHub - internetarchive/heritrix3: Heritrix is the Internet Archive's ...
Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project. - internetarchive/heritrix3
Use checkbox hack for the hamburger menu#4405 - GitHub
Hey @jdlrobson, I tried using the checkbox hack, but that doesn't seem to be able to solve the problem at the moment (I think using javascript to achieve this would a viable solution).
Big spikes in /books/* traffic causing performance issues #8319
Sep 21, 2023 · There have been noticeable spikes in /books/ traffic twice-a-day, every two days since ~September 16th. Causing performance issues throughout the site during the spike, and …
Security Overview · internetarchive/heritrix3 · GitHub
Heritrix's user interface and configuration format allow authenticated users to run arbitrary code, edit local files and so forth. Therefore all Heritrix operators must necessarily considered fully trusted and …
Don't raise alerts for DNS records that point to 0.0.0.0 #428
When scanning previously-known domains, Heritrix really does not like DNS records that point to 0.0.0.0 and raises alerts like this: SEVERE: Problem java.lang.IllegalStateException: got suspicious ...
Editing an edition with-no-work no longer creates a work on save
Jan 18, 2018 · Which looks like it is trying to do the same thing. Both are old code, and neither are working now. There should only be one version, it should work, and let's not label it a hack in the …
heritrix3/README.md at master - GitHub
Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project. - heritrix3/README.md at master · internetarchive/heritrix3
RateLimitGuard.authenticate () authentication failure · Issue #474 ...
Apr 2, 2022 · I'm currently setting up da Herittrix/WCT/OWA stack and have come quite far (first crawls running, able to see them in WCT interface and do quality review). However, in the heritrix logs I am …
Make FetchHistoryProcessor 304 handler more robust
Feb 3, 2019 · The FetchHistoryProcessor has special logic to handle HTTP 304 responses...