Legalities and Ethics


Prior to downloading, parsing, or using the content of a website in any way, you should always consider the legality and ethics of what you are doing. From a legal perspective, there are at least three things to consider:

  1. anti-hacking and cyber crime laws;
  2. copyright laws;
  3. and terms of use.

Anti-hacking and cybercrime laws vary from country to country and state to state. Most of these laws regulate unauthorized access to information systems or the improper use of such systems. Merely parsing content from a site would not typically constitute a crime, but if you copied, stored, shared, or used data in ways that the owner did not permit, then these laws might come into play. If you have questions about this, please consult a legal expert in cybercrime law.

If you are making copies of content (e.g., extracting data from a site and storing it on your computer), then copyright laws come into play. Each country has its own copyright laws, but in the U.S., you will generally either need permission from the content owner or ensure that your use constitutes Fair Use under the law. If you have questions about this, please consult a legal expert in copyright law.

Finally, the Terms of Use is a statement typically found on a site where the content owner describes what constitutes appropriate use of their site. Note that owners can claim anything they want in a Terms of Use statement (even things that they are not legally allowed to claim) and that the user is typically legally bound to the terms if they have been given adequate notice and accept them. Even without this, however, the Terms of Use statement allows you to understand how the owner sees and may seek to enforce their rights, and you should generally follow the requirements outlined in the Terms of Use statement. If you have questions about this, please consult a legal expert.

In addition to legal considerations, you should also consider the ethics of what you are doing and ensure that your behavior does not harm people or organizations.

Robots.txt Files

In addition to the above, most sites also have a robots.txt file at their root level to communicate to automated tools which areas of the site should and should not be scraped, indexed, etc. For instance, if you go to in your web browser, you will find a list of areas of this website that may and may not be indexed. You can find such documents on just about any site by typing /robots.txt after the main domain name. Such documents are helpful to guide you in determining which areas of a site the developer wants you to access via automated methods and which areas they deem to be off-limits.

This content is provided to you freely by EdTech Books.

Access it online or download it at