How to use robots in website development

Release Date:2024-07-23

Views:

Using robots.txt files in your website is a way to control the access of search engine crawlers to your website content. Here are the detailed steps and precautions on how to use robots.txt in your website:

1. Create a robots.txt file

Using a text editor: Open a text editor (such as Notepad++, Sublime Text, or just plain Notepad) and prepare to write the contents of your robots.txt file.
Write rules: Write corresponding rules according to the needs of the website. Usually, these rules include specifying which search engine crawlers (User-agent) are allowed (Allow) or prohibited (Disallow) to access which URL paths.
Save the file: Save the file as robots.txt, make sure the file extension is .txt, and the file name is all lowercase. Also, make sure the file is encoded in UTF-8 to avoid garbled characters on different servers or browsers.

2. Place the robots.txt file

Upload to the website root directory: Use FTP software or the website backend management interface to upload the robots.txt file to the website root directory. For example, if your website domain name is www.example.com, then the robots.txt file should be located at http://www.example.com/robots.txt.
After the website is online, the system will automatically generate a robots file. The robots URL is: domain name/robots.txt;
If you need to customize the robots file, you can modify it in the CMS management backend - SEO management - robots file. If you want to change back to the system default robots file after customization, you can click the initialize button on the right to restore the system default robots file content. After modification, save and publish.

If you do not want a page on your website to be included, for example, the page link is https://www.abc.com/fuwutiaokuan.html, if you do not want the page to be included, you can add the part after the domain name/ to the robots file content, as shown below:

3. Write robots.txt rules

The content of the robots.txt file mainly consists of a series of user agents and instructions (such as Disallow, Allow). The following are some basic rules and examples:

User-agent: Specifies which search engine crawler the rule applies to. For example, User-agent: * means the rule applies to all crawlers; User-agent: Googlebot means the rule applies only to Google's crawler.
Disallow: Specifies the URL path that you do not want to be accessed. For example, Disallow: /admin/ means that access to the /admin/ directory and its subdirectories and files under the website root directory is prohibited.
Allow (optional): The opposite of Disallow, specifies the URL path that is allowed to be accessed. It should be noted that not all search engines support the Allow directive, and when used, it is usually used in conjunction with the Disallow directive to provide more refined control.

Example

In this example:

All crawlers are prohibited from accessing the /admin/ and /cgi-bin/ directories.
Googlebot is allowed to access the /special-content/ directory, but is blocked from accessing the rest of the site (Note: Disallow: / here comes after Allow, which actually overrides the previous Allow directive, unless the search engine supports the Allow directive and handles this situation correctly). However, this is just an example, and actual use may require adjustments to avoid such conflicts.
The Sitemap directive provides the URL of the site map to help search engines better understand the site structure.

5. Notes

Make sure the file name and location are correct: The robots.txt file must be located in the root directory of your website and the file name must be all lowercase.
Be careful when writing rules: Incorrect rules may cause important pages to be ignored or deleted by search engines, affecting the SEO effect of the website.
Regular review and updates: As your website content is updated and changes, you may need to regularly review and update your robots.txt file to ensure it still meets the needs of your website.
Understand search engine support: Different search engines may have different levels of support for robots.txt files, so you need to take this into account when writing your rules.
Use tool detection: You can use various online tools to detect whether the syntax and logic of the robots.txt file are correct to ensure that search engines can correctly understand and apply these rules.

If you have any questions about the construction and operation of foreign trade websites, please contact Yiyingbao technical customer service WeChat: Ieyingbao18661939702, and the staff will answer you wholeheartedly!