[BUGFIX] Links on external pages don't get indexed 90/6990/10
authorMario Rimann <mario.rimann@typo3.org>
Fri, 5 Apr 2013 18:48:24 +0000 (20:48 +0200)
committerGeorg Ringer <georg.ringer@gmail.com>
Tue, 16 Jul 2013 11:44:35 +0000 (13:44 +0200)
commit819b5be0ac81004371fee2f0e6386cc32233a59b
tree88be42bfd09224b6a42cc75b35ad1dee9009e197
parent485c07f01a4d49dbb963e46692652d0fcd896f73
[BUGFIX] Links on external pages don't get indexed

Allows the crawler to start indexing a specific file like
www.domain.tld/foobar.html instead of just www.domain.tld/

This is just about the comparison against the base URL and
enables the Crawler to start crawling at e.g. a file that contains
a manually generated list of links to follow. Before that change,
even links to targets on the same domain were rejected by
the checkUrl() method in case the base Url was pointing to some
file instead of "/". This was because the base URL was then not
part of the target URL.
After stripping off any path from the base URL for this comparison
this can now also be used to start crawling from a file.

Change-Id: I2727a9a447754b88d2c279c24b32b5c3a2df26c0
Resolves: #16534
Releases: 6.2, 6.1, 6.0, 4.7, 4.5
Reviewed-on: https://review.typo3.org/6990
Reviewed-by: Michael Stucki
Tested-by: Michael Stucki
Reviewed-by: Georg Ringer
Tested-by: Georg Ringer
typo3/sysext/indexed_search/Classes/Hook/CrawlerHook.php
typo3/sysext/indexed_search/Tests/Unit/Hook/CrawlerHookTest.php [new file with mode: 0644]