YaCy 1.6 is out since yesterday. This release contains mostly enhancements, bugfixes, a special ‚greedy learning mode‘ and most important, a feature that many of you requested using anonymous messages to http://sayat.me/YaCy, a link that is shown on the goodby-screen of YaCy after a shut-down. Many people used this link to report, that they would run YaCy permanently if YaCy tries to keep it’s CPU load and IO low. And thats therefore the main feature of version 1.6: if YaCy is running in the background and used for searching, it will try to keep it’s IO and CPU load low.
Here is what we did in detail:
We examined the IO problem and found out that Solr needs regular optimization processes. Without this, the IO is very high during DHT transmission (the peer-to-peer sharing of search indexes). With an optimized Solr, this process is done much more efficient.
We integrated a CPU load sensor which causes that no DHT transmission is made if the CPU load is too high (affects sending and receiving of indexes).
The new ‚greedy learning‘ mode will cause that YaCy loads linked documents from the first search results, until a total of 1000 documents are in the local search index. This is mostly reached at least after the first three searches and after that, YaCy can benefit from these documents in it’s search.
When you do a peer-to-peer search, the requesting peer must wait for the remote peers to submit their remote search result. To show a progress of remote searches, the search interface had a progress bar. It’s still there, but when we showed the new YaCy Release 1.4 at the Linuxtage Fair in Chemnitz last weekend, people said: „is there a bug? the progress bar does not show„. No, it’s not a bug, in many cases the bar flashes so fast that you cannot see the bar any more.
Furthermore, the search result quality has increased. This is the result of the advancing deep Solr integration not only in local search, but also in remote search. The integration is not yet fully finished, but it now shows a new quality of integration flexibility, speed and relevancy of search results.
Here are some more details about the main changes in YaCy 1.4:
This release includes mainly a deeper Solr integration, much more Solr fields are filled, Solr has now mutli-core capabilities and a second core with a webgraph was added (but deactivated for further testing).
The opensearch result writer of the integrated solr has now all the features as the original opensearch result servlet of YaCy had, and the file search interface „yacyinteractive“ now uses this new result writer instead the old one. The search of that interface is now much faster.
The default search process has undergone a full re-design and a lot of testing was done to fix problems with the to-solr migration. The normal (local) search is now very fast, especially in portal mode and even in p2p mode.
The ranking was strongly enhanced, there is now a support for flexible field boosts, boost functions and boost queries (see servlet /RankingSolr_p.html). All these ranking functions had been made editable and there is a new configuration sevlet for this. Furthermore, there are several ranking
schemas predefined, one for default internet search, one for sort-by-date and one for intranet search requests, which is triggered automatically if a site-operator is used. Intranet search ranking rates deep links higher than shallow which returns more specific document types. Remote searches are done using the local ranking profile, not the remote profile.
The selection of target peers had been enhanced, now all robinson peers which have a solr interface are searched using that interface rather with the old YaCy interface.
There should also an enhancement in indexing speed as there are less requests to the solr for
doing that and index updates are bundled together while forced commits had been reduced using a new solr 4.1 soft-commit feature.
There had also been fixes to some memory leaks and the overall memory usage should be lower.
Today we released YaCy 1.2. This is a major change in the architecture of YaCy since Search functions now use primary Solr as indexing engine instead our Peer-to-Peer optimized distributed indexing algorithms. This step also means that YaCy gives up the attempt to create it’s own indexing technology in favor of the much more advanced Solr/Lucene library.
Embedded Solr 4.0.0 with standard Solr XML search interface integrated
Enhanced crawling with live link structure visualization
A Host Browser to explore the file structure of crawled hosts: this shows loaded pages, pages with errors and pending files in the same way as a file browser would show the content of a host.
We believe that the new features are also valuable for Web Administration and Search Engine Optimization (i.e. to find dead links etc). Please have a look to the screen shots at the YaCy ApacheCon 2012 Talk Slides.
YaCy is available as Windows, Mac and debian package and also as tarball. To download YaCy, visit the YaCy Home Page.
Use Solr, but don‘t home-brew your own code around it if you do web-, file- or intranet-search, it‘s all inside YaCy. And don‘t buy a commercial appliance, this is free and better!
As response to the great feed-back to our release we had to check if our network was able to scale to up to 4000 peers which our network consisted of in the meantime. We found out that the biggest problem was just a display of the network graphic which is fixed now. During the time of the strongest growth of the search network the distributed hash table failed because it takes time to distribute the index elements to the new peers. Existing peers became invisible since the new peers became ‚in front‘ of old one. This caused bad search results. Another problem that was often discussed was: search time and ranking quality. The search time is connected to fraud detection which we showed in a short speech about that topic and we worked on that problem further in the 1.01 release. On public demand we also added indexing to the integrated bookmark system. (see: German Article about YaCy and the integrated Bookmarking).
Please try the new release since it should also be a bit more stable: if you have not discovered the update function yet then please give it a try (it’s very easy to update YaCy!): click on Administration -> System Update (in the top right) -> Select the latest Release from the drop-down box -> Download -> Install Release.
This project can only live if people help. If you help! Please join in!