About 50 results
Open links in new tab
  1. (SECURITY) XML External Entity (XXE) vulnerability in CrawlJob XML ...

    Feb 20, 2026 · I'll preface this by saying, yes, I've read the security.md, and this seems to fall under the elephant in the room but I thought it was still worth a bug report. XXE to LFI in Heritrix-3.14.0: Impa...

  2. GitHub - internetarchive/heritrix3: Heritrix is the Internet Archive's ...

    Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project. - internetarchive/heritrix3

  3. Use checkbox hack for the hamburger menu#4405 - GitHub

    Hey @jdlrobson, I tried using the checkbox hack, but that doesn't seem to be able to solve the problem at the moment (I think using javascript to achieve this would a viable solution).

  4. Big spikes in /books/* traffic causing performance issues #8319

    Sep 21, 2023 · There have been noticeable spikes in /books/ traffic twice-a-day, every two days since ~September 16th. Causing performance issues throughout the site during the spike, and …

  5. Security Overview · internetarchive/heritrix3 · GitHub

    Heritrix's user interface and configuration format allow authenticated users to run arbitrary code, edit local files and so forth. Therefore all Heritrix operators must necessarily considered fully trusted and …

  6. Don't raise alerts for DNS records that point to 0.0.0.0 #428

    When scanning previously-known domains, Heritrix really does not like DNS records that point to 0.0.0.0 and raises alerts like this: SEVERE: Problem java.lang.IllegalStateException: got suspicious ...

  7. Editing an edition with-no-work no longer creates a work on save

    Jan 18, 2018 · Which looks like it is trying to do the same thing. Both are old code, and neither are working now. There should only be one version, it should work, and let's not label it a hack in the …

  8. heritrix3/README.md at master - GitHub

    Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project. - heritrix3/README.md at master · internetarchive/heritrix3

  9. RateLimitGuard.authenticate () authentication failure · Issue #474 ...

    Apr 2, 2022 · I'm currently setting up da Herittrix/WCT/OWA stack and have come quite far (first crawls running, able to see them in WCT interface and do quality review). However, in the heritrix logs I am …

  10. Make FetchHistoryProcessor 304 handler more robust

    Feb 3, 2019 · The FetchHistoryProcessor has special logic to handle HTTP 304 responses...