Do the Datenbutler can extract every data from the internet?

Technically yes, the Datenbutler is able to extract all information which is reachable from the internet. But of course the crawler has to adhere to the law and won't extract data from pages, where it is not allowed to. Especially copyrights or license dependent rules will be respected and the compliance will be regarded by the crawler.

How do I identify the Datenbutler crawler?

All mindUp crawler systems respect the settings in the robots.txt file from domain's root. They can always be identified by a special user agent string while they are reading the web pages. If you want them not to read your web pages, you simply have to setup a blocking string into your robots.txt.

User-Agent: mindUpBot (