Heritrix

Results: 85



#Item
1World Wide Web / Computing / Internet / Web archiving / Country code top-level domains / Internet search engines / Identifiers / Web crawler / Robots exclusion standard / Heritrix / .re / Association franaise pour le nommage Internet en coopration

Legal deposit of the French Web: harvesting strategies for a national domain France Lasfargues, Clément Oury, and Bert Wendland Bibliothèque nationale de France Quai François MauriacParis Cedex 13

Add to Reading List

Source URL: iwaw.europarchive.org

Language: English - Date: 2008-08-28 09:09:00
2Computing / Web archiving / World Wide Web / Digital preservation / PhantomJS / Heritrix / Web crawler / Web ARChive / Headless browser / Uniform Resource Identifier / International Internet Preservation Consortium / Wayback Machine

Adapting the Hypercube Model to Archive Deferred Representations and Their Descendants Justin F. Brunelle, Michele C. Weigle, and Michael L. Nelson Old Dominion University Department of Computer Science Norfolk, Virginia

Add to Reading List

Source URL: arxiv.org

Language: English - Date: 2016-01-20 22:01:21
3Web archiving / Webarchiv / International Internet Preservation Consortium / Internet Memory Foundation / Wayback Machine / Internet Archive / Heritrix / Open access / Archive / Web ARChive / Digital library / Memento Project

Proceedings Template - WORD

Add to Reading List

Source URL: www.websci11.org

Language: English - Date: 2016-03-15 09:54:44
4Web archiving / PhantomJS / Heritrix / Web crawler / World Wide Web / Web ARChive / Uniform Resource Identifier / Headless browser / International Internet Preservation Consortium / Wayback Machine / Archive.is / Crawl

Adapting the Hypercube Model to Archive Deferred Representations and Their Descendants Justin F. Brunelle, Michele C. Weigle, and Michael L. Nelson Old Dominion University Department of Computer Science Norfolk, Virginia

Add to Reading List

Source URL: www.hanzoarchives.com

Language: English - Date: 2016-03-17 13:30:54
5World Wide Web / Web crawler / Heritrix / Focused crawler / Uniform Resource Identifier / Crawler / Web resource / Robots exclusion standard / HTML / Hypertext Transfer Protocol / Internet Archive / Crawling

Incremental crawling with Heritrix Kristinn Sigurðsson National and University Library of Iceland ArngrímsgötuReykjavík Iceland

Add to Reading List

Source URL: iwaw.europarchive.org

Language: English - Date: 2007-05-30 18:00:00
6Web archiving / International Internet Preservation Consortium / Webarchiv / QA / Heritrix / Internet Archive / Digital library / Robots exclusion standard / Quality assurance / Web scraping

WebArchiving@UNT Current Quality Assurance Practices in Web Archiving Prepared By Brenda Reyes Ayala

Add to Reading List

Source URL: digital.library.unt.edu

Language: English - Date: 2016-06-18 12:54:52
7Digital libraries / Data quality / Archival science / Web archiving / Data management / Robots exclusion standard / Link rot / Web ARChive / Heritrix / Computing / Information / World Wide Web

CLEAR: a credible method to evaluate website archivability Vangelis Banos† Yunhyong Kim‡ Seamus Ross‡ Yannis Manolopoulos† †Aristotle University of Thessaloniki, Greece ‡

Add to Reading List

Source URL: purl.pt

Language: English
8Internet Archive / Semantic Web / Web archiving / Wayback Machine / World Wide Web / Heritrix / Archive / Uniform resource identifier / Internet / Humanities / Digital media / Technology

Combining Social Media Storytelling With Web Archives Michael L. Nelson, Michele C. Weigle, Kristine Hanna {mln, mweigle}@cs.odu.edu, In this project, Old Dominion University is the lead applicant an

Add to Reading List

Source URL: www.imls.gov

Language: English - Date: 2015-04-23 12:50:05
9Information science / Semantic Web / URI schemes / Heritrix / Web archiving / International Internet Preservation Consortium / Internet Archive / Robots exclusion standard / Uniform resource identifier / World Wide Web / Computing / Web crawlers

An Introduction to Heritrix An open source archival quality web crawler Gordon Mohr, Michael Stack, Igor Ranitovic, Dan Avery and Michele Kimpton Internet Archive Web Team {gordon,stack,igor,dan,michele}@archive.org

Add to Reading List

Source URL: crawler.archive.org

Language: English - Date: 2011-06-09 19:53:47
10Information science / Web crawler / Sitemaps / Archive / Website / Web content / Link rot / Heritrix / World Wide Web / Web archiving / Computing

The UK Government Web Archive Guidance for digital and records management teams © Crown copyright 2015 You may re-use this information (excluding logos) free of charge in any format or medium, under

Add to Reading List

Source URL: nationalarchives.gov.uk

Language: English - Date: 2015-01-29 07:28:53
UPDATE