According to the German Wikipedia YaCy turned 10 this year. On December 15th 2003 Michael Christen publicly mentioned his idea of a peer-to-peer base search engine in the comments of a news article published at heise.de, a German IT news website, for the first time.
When I learned about YaCy a few months later, it was still in it’s very humble beginnings. I think there already was a website, but there was no version control for the source code, no forum, and only a small community of 2 or 3 developers. The crawler was not yet finished and indexing was done via the proxy which is still included today. I still remember how excited I was when my index contained the first 1000 documents and how disappointed I was when I lost them because Michael changed the database format once again (Solr was still far, far away).
During the last 10 years I have been laughed at (this has not happened for along time) and yelled at (only once at 26c3), but most of the time I had lots of fun learning and getting to know quite a few funny, interesting, and inspiring people. I visited places and events I would not have attended if it was not for YaCy and I got a nice certificate.
Even though I have never been the most active contributor, I hope that YaCy will stay a part of my life for at least another 10 years.
On October 2nd YaCy was featured on FLOSS Weekly, a video podcast about Free Libre Open Source Software hosted by Randal Schwartz. The co-host of this episode was Aaron Newcomb. YaCy was represented by Michael Christen.
YaCy 1.6 is out since yesterday. This release contains mostly enhancements, bugfixes, a special ‘greedy learning mode’ and most important, a feature that many of you requested using anonymous messages to http://sayat.me/YaCy, a link that is shown on the goodby-screen of YaCy after a shut-down. Many people used this link to report, that they would run YaCy permanently if YaCy tries to keep it’s CPU load and IO low. And thats therefore the main feature of version 1.6: if YaCy is running in the background and used for searching, it will try to keep it’s IO and CPU load low.
Here is what we did in detail:
We examined the IO problem and found out that Solr needs regular optimization processes. Without this, the IO is very high during DHT transmission (the peer-to-peer sharing of search indexes). With an optimized Solr, this process is done much more efficient.
We integrated a CPU load sensor which causes that no DHT transmission is made if the CPU load is too high (affects sending and receiving of indexes).
The new ‘greedy learning’ mode will cause that YaCy loads linked documents from the first search results, until a total of 1000 documents are in the local search index. This is mostly reached at least after the first three searches and after that, YaCy can benefit from these documents in it’s search.
Der LinuxTag 2013 mittlerweile ein paar Tage vorbei. Es hat wieder eine Menge Spaß gemacht, den Stand zu betreuen und andere Entwickler zu treffen. Zahlreiche Besucher kannten uns schon von vorherigen Veranstaltungen und wollten sich auf den neuesten Stand bezüglich der aktuellen Entwicklung des Projekts bringen. Schön war auch, dass einige Besucher von Freunden an unseren Stand geschickt wurden, weil sie YaCy cool/wichtig/interessant fanden.
Den Zedler-Preis hat YaCy leider nicht gewonnen, aber erstens ist es keine Schande, gegen Wheelmap.org zu “verlieren” und zweitens war es trotzdem eine nette Veranstaltung. Nachdem die Teilnehmer instruiert waren, wer sich wann wo zu befinden hat und was zu tun hat, gab es vor der eigentlichen Veranstaltung noch einen Sektempfang, der Gelegenheit dazu bot, sich mit anderen Nominierten und den Wikipedianern auszutauschen. Die Veranstaltung war für uns natürlich sehr aufregend und ging sehr schnell vorbei. Da “unsere” Kategorie die letzte war, in welcher der Preis verliehen wurde, war es die ganze Zeit spannend. Zwar gab es für YaCy keinen Preis, aber immerhin konnten jeder von uns eine Urkunde mit nach Hause nehmen, die bestätigt, dass YaCy für den Zedler-Preis nominiert war. Das Buffet nach der Veranstaltung musste für mich leider ausfallen, da ich noch die Heimfahrt vor mir hatte.
Die Preisverleihung findet am 25. Mai 2013 ab 19 Uhr im Palais der Kulturbrauerei in Berlin statt. Der Eintritt ist frei. Um Anmeldung wird gebeten, entweder per Mail an email@example.com oder mittels Googleformular.
When you do a peer-to-peer search, the requesting peer must wait for the remote peers to submit their remote search result. To show a progress of remote searches, the search interface had a progress bar. It’s still there, but when we showed the new YaCy Release 1.4 at the Linuxtage Fair in Chemnitz last weekend, people said: “is there a bug? the progress bar does not show“. No, it’s not a bug, in many cases the bar flashes so fast that you cannot see the bar any more.
Furthermore, the search result quality has increased. This is the result of the advancing deep Solr integration not only in local search, but also in remote search. The integration is not yet fully finished, but it now shows a new quality of integration flexibility, speed and relevancy of search results.
Here are some more details about the main changes in YaCy 1.4:
This release includes mainly a deeper Solr integration, much more Solr fields are filled, Solr has now mutli-core capabilities and a second core with a webgraph was added (but deactivated for further testing).
The opensearch result writer of the integrated solr has now all the features as the original opensearch result servlet of YaCy had, and the file search interface “yacyinteractive” now uses this new result writer instead the old one. The search of that interface is now much faster.
The default search process has undergone a full re-design and a lot of testing was done to fix problems with the to-solr migration. The normal (local) search is now very fast, especially in portal mode and even in p2p mode.
The ranking was strongly enhanced, there is now a support for flexible field boosts, boost functions and boost queries (see servlet /RankingSolr_p.html). All these ranking functions had been made editable and there is a new configuration sevlet for this. Furthermore, there are several ranking
schemas predefined, one for default internet search, one for sort-by-date and one for intranet search requests, which is triggered automatically if a site-operator is used. Intranet search ranking rates deep links higher than shallow which returns more specific document types. Remote searches are done using the local ranking profile, not the remote profile.
The selection of target peers had been enhanced, now all robinson peers which have a solr interface are searched using that interface rather with the old YaCy interface.
There should also an enhancement in indexing speed as there are less requests to the solr for
doing that and index updates are bundled together while forced commits had been reduced using a new solr 4.1 soft-commit feature.
There had also been fixes to some memory leaks and the overall memory usage should be lower.
Today we released YaCy 1.2. This is a major change in the architecture of YaCy since Search functions now use primary Solr as indexing engine instead our Peer-to-Peer optimized distributed indexing algorithms. This step also means that YaCy gives up the attempt to create it’s own indexing technology in favor of the much more advanced Solr/Lucene library.
Embedded Solr 4.0.0 with standard Solr XML search interface integrated
Enhanced crawling with live link structure visualization
A Host Browser to explore the file structure of crawled hosts: this shows loaded pages, pages with errors and pending files in the same way as a file browser would show the content of a host.
We believe that the new features are also valuable for Web Administration and Search Engine Optimization (i.e. to find dead links etc). Please have a look to the screen shots at the YaCy ApacheCon 2012 Talk Slides.
YaCy is available as Windows, Mac and debian package and also as tarball. To download YaCy, visit the YaCy Home Page.
Use Solr, but don‘t home-brew your own code around it if you do web-, file- or intranet-search, it‘s all inside YaCy. And don‘t buy a commercial appliance, this is free and better!