URL detection is the first part of any WP2Static Job, as per this workflow illustration:
URL Detection <--- here
|
Crawling
|
Post Processing
|
Deployment
How URL detection works
WP2Static detects all of your WordPress URLs, by querying the database for all kinds of URLs:
- posts
- pages
- archives
- categories
- & many more…
It also looks at all the files within your WordPress installation, to determine any other URLs that we can’t identify just from the database, such as theme files, cache dirs, robots.txt, & some more…
But I don’t want to include all my old media gallery files!
That’s fine, we accomodate selectively filtering/adding URLs to your Crawl Queue (your list of detected URLs). The challenge is, we don’t know what each user wants, so we default to detecting ALL URLs we can identify from your site, then let you adjust this list of URLs, by the plugin’s options, custom filters or via add-ons.
There are a bunch of URLs included that I don’t think I need
This is usually fine to just leave them in there. Thanks to WP2Static’s caching mechanisms, after your initial job of detecting, crawling, post-processing and deploying, only files changed since the previous deploy need to be crawled, processed and deployed again.
Verifying which URLs have been detected
- via WP2Static’s UI: Caches > Crawl Queue (Detected URLs) > Show URLs
- via WP-CLI:
wp wp2static crawl_queue list
Ways to trigger URL detection
- via enqueuing a URL Detection Job
- via the WP-CLI command
wp wp2static detect
- calling the static method
URLDetector::detectURLs()
Ways to modify the Crawl Queue (add/remove/transform URLs)
- via WP2Static’s UI: Options > Detection Options
- via the filter
wp2static_modify_initial_crawl_list
- via the database, table named
{wp_prefix}_wp2static_urls
Ways to clear the Crawl Queue
You generally don’t need to manually clear the Crawl Queue, as WP2Static will clear it before each new detection Job. It’s also a part of the whole detect, crawl, post-process, deployment Job that takes up very little time, even on very large sites (takes only seconds).
- via WP2Static’s UI: Caches > Crawl Queue (Detected URLs) > Delete Crawl Queue
- via WP-CLI:
wp wp2static crawl_queue delete
Example Crawl Queue (Detected URLs)
A small sampling of what URLs may be detected in your WordPress site, given that your theme, posts, pages, permalinks, etc can vary between sites. Crawl Queue is sorted alphabetically, as are other listings of paths/URLs in the site, to make it easier to compare/troubleshoot a missing file.
/
/category/uncategorized/
/category/uncategorized/page/1/
/favicon.ico
/hello-world/
/page/1/
/pages/page/1/
/robots.txt
/sample-page/
/sitemap.xml
/wp-content/plugins/wp-search-with-algolia/css/algolia-autocomplete.css
/wp-content/plugins/wp-search-with-algolia/css/algolia-instantsearch.css
/wp-content/plugins/wp-search-with-algolia/css/algolia-logo.svg
/wp-content/plugins/wp-search-with-algolia/includes/admin/css/algolia-admin.css
/wp-content/plugins/wp-search-with-algolia/includes/admin/fonts/algolia.eot
/wp-content/plugins/wp-search-with-algolia/includes/admin/fonts/algolia.svg
/wp-content/plugins/wp-search-with-algolia/includes/admin/fonts/algolia.ttf
/wp-content/plugins/wp-search-with-algolia/includes/admin/fonts/algolia.woff
...
What happens to the detected URLs?
Our Crawl Queue (detected URLs) is consumed by the Crawling phase of a WP2Static Job.