Jul 29

YaCy 1.6: we listen to your anonymous messages!

YaCy 1.6 is out since yesterday. This release contains mostly enhancements, bugfixes, a special ‚greedy learning mode‘ and most important, a feature that many of you requested using anonymous messages to http://sayat.me/YaCy, a link that is shown on the goodby-screen of YaCy after a shut-down. Many people used this link to report, that they would run YaCy permanently if YaCy tries to keep it’s CPU load and IO low. And thats therefore the main feature of version 1.6: if YaCy is running in the background and used for searching, it will try to keep it’s IO and CPU load low.

Here is what we did in detail:

  • We examined the IO problem and found out that Solr needs regular optimization processes. Without this, the IO is very high during DHT transmission (the peer-to-peer sharing of search indexes). With an optimized Solr, this process is done much more efficient.
  • We integrated a CPU load sensor which causes that no DHT transmission is made if the CPU load is too high (affects sending and receiving of indexes).

The new ‚greedy learning‘ mode will cause that YaCy loads linked documents from the first search results, until a total of 1000 documents are in the local search index. This is mostly reached at least after the first three searches and after that, YaCy can benefit from these documents in it’s search.

We urgently need your help, please become a YaCy developer! If you have any ideas or suggestions how to enhance YaCy, the please clone our git repository, watch a video how to start developing and send us a merge request!

Mrz 18

Release 1.4: „The Search Progess Bar Has Disappeared“

We published the new main release 1.4

Speed

When you do a peer-to-peer search, the requesting peer must wait for the remote peers to submit their remote search result. To show a progress of remote searches, the search interface had a progress bar. It’s still there, but when we showed the new YaCy Release 1.4 at the Linuxtage Fair in Chemnitz last weekend, people said: „is there a bug? the progress bar does not show„. No, it’s not a bug, in many cases the bar flashes so fast that you cannot see the bar any more.

Quality

Furthermore, the search result quality has increased. This is the result of the advancing deep Solr integration not only in local search, but also in remote search. The integration is not yet fully finished, but it now shows a new quality of integration flexibility, speed and relevancy of search results.

Here are some more details about the main changes in YaCy 1.4:

  • This release includes mainly a deeper Solr integration, much more Solr fields are filled, Solr has now mutli-core capabilities and a second core with a webgraph was added (but deactivated for further testing).
  • The opensearch result writer of the integrated solr has now all the features as the original opensearch result servlet of YaCy had, and the file search interface „yacyinteractive“ now uses this new result writer instead the old one. The search of that interface is now much faster.
  • The default search process has undergone a full re-design and a lot of testing was done to fix problems with the to-solr migration. The normal (local) search is now very fast, especially in portal mode and even in p2p mode.
  • The ranking was strongly enhanced, there is now a support for flexible field boosts, boost functions and boost queries (see servlet /RankingSolr_p.html). All these ranking functions had been made editable and there is a new configuration sevlet for this. Furthermore, there are several ranking
    schemas predefined, one for default internet search, one for sort-by-date and one for intranet search requests, which is triggered automatically if a site-operator is used. Intranet search ranking rates deep links higher than shallow which returns more specific document types. Remote searches are done using the local ranking profile, not the remote profile.
  • The selection of target peers had been enhanced, now all robinson peers which have a solr interface are searched using that interface rather with the old YaCy interface.
  • There should also an enhancement in indexing speed as there are less requests to the solr for
    doing that and index updates are bundled together while forced commits had been reduced using a new solr 4.1 soft-commit feature.
  • There had also been fixes to some memory leaks and the overall memory usage should be lower.
  • There is also a large number of small bug fixes.
Nov 08

YaCy 1.2 Release with embedded Solr

Today we released YaCy 1.2. This is a major change in the architecture of YaCy since Search functions now use primary Solr as indexing engine instead our Peer-to-Peer optimized distributed indexing algorithms. This step also means that YaCy gives up the attempt to create it’s own indexing technology in favor of the much more advanced Solr/Lucene library.

While we will still do Peer-to-Peer search, we will now consider YaCy also as a Crawling and Search Framework for Solr. Today we presented „YaCy as Search Appliance“ at the ApacheCon 2012 with great success:

YaCy Tweets at ApacheCon 2012

Notable new features in YaCy 1.2 are:

  • Embedded Solr 4.0.0 with standard Solr XML search interface integrated
  • Enhanced crawling with live link structure visualization
  • A Host Browser to explore the file structure of crawled hosts: this shows loaded pages, pages with errors and pending files in the same way as a file browser would show the content of a host.

We believe that the new features are also valuable for Web Administration and Search Engine Optimization (i.e. to find dead links etc). Please have a look to the screen shots at the YaCy ApacheCon 2012 Talk Slides.

YaCy is available as Windows, Mac and debian package and also as tarball. To download YaCy, visit the YaCy Home Page.

tl;dr

Use Solr, but dont home-brew your own code around it if you do web-, file- or intranet-search, its all inside YaCy. And dont buy a commercial appliance, this is free and better!

Dez 07

YaCy Bugfix Release 1.01 With New Community Contributions

Today we release a first bugfix release after the 1.0 release last week: download YaCy 1.01. In the context of the main release event we asked for help and new developers: this was a great success! Our new Git repository was cloned 11 times and we got bug fixes and translations: a greek and a chinese translation is meanwhile beeing produced. More help is very welcome, please see our tutorial movie about starting development with YaCy and if you like to make a interface translation, please see this manual.

As response to the great feed-back to our release we had to check if our network was able to scale to up to 4000 peers which our network consisted of in the meantime. We found out that the biggest problem was just a display of the network graphic which is fixed now. During the time of the strongest growth of the search network the distributed hash table failed because it takes time to distribute the index elements to the new peers. Existing peers became invisible since the new peers became ‚in front‘ of old one. This caused bad search results. Another problem that was often discussed was: search time and ranking quality. The search time is connected to fraud detection which we showed in a short speech about that topic and we worked on that problem further in the 1.01 release. On public demand we also added indexing to the integrated bookmark system. (see: German Article about YaCy and the integrated Bookmarking).

Please try the new release since it should also be a bit more stable: if you have not discovered the update function yet then please give it a try (it’s very easy to update YaCy!): click on Administration -> System Update (in the top right) -> Select the latest Release from the drop-down box -> Download -> Install Release.

This project can only live if people help. If you help! Please join in!

Sep 06

Geocaching Suchportal

In Zusammenarbeit mit moenk, dem Liebel-Lab und der YaCy-Community (hier beim Aufstellen der Server) ist nun das auf YaCy basierende Geocaching Suchportal entstanden. Die Suche ist bereits auf stark frequentierten Portalen wie dem geoclub.de – Forum und dem geocaching-portal durch einen Suchschlitz eingebunden.

Geocaching SuchportalDie Suchergebnisse zeigen Links von speziellen Geocaching-Webseiten, die jeder per Formular bekannt geben kann, die URL ist: http://www.geocaching-portal.com/add_link.php

Als Besonderheit zeigt die Suchseite auch eine Karte von Orten an, die YaCy anhand von Ortsnamen, Telefonvorwahlen, KFZ-Kennzeichen oder Postleitzahlen im Suchbegriff identifiziert. Die Geolokalisation erfolgt mit Hilfe der OpenGeoDB und die Karten stammen von OpenStreetMap. Die Geoinformationen erweitern auch zusammen mit einem speziellen Geocaching-Wortschatz die neue Wortvorschlagsfunktion.