Post by CaptainHaddock » Wed Jul 14, 2021 5:28 am

Hi,
I've recently changed from Opencart 1.5.5.1 to OC3
The robots.txt file included in OC3 is different to the one I've used for many years.
Also,it also doesn't include "User-agent: *" anywhere which seems to mean it is doing nothing according to the Bing robots.txt tester.
I have a couple of questions:
1) Should I include User-agent: * at beginning of the OC3 robots.txt? \
2) If so, why wasn't it in the file ?
3) Does anyone have an example of a "best practice" robots.txt file for OC3.
Thanks very much

Active Member

Posts

Joined
Tue Jul 02, 2013 7:01 am


Post by ADD Creative » Wed Jul 14, 2021 7:25 am

1. Yes it should have User-agent: * at the beginning.
2. It's just another bug that seems to have made it into the latest release.
3. Depends on your site and what has already been indexed.

www.add-creative.co.uk


Expert Member

Posts

Joined
Sat Jan 14, 2012 1:02 am
Location - United Kingdom

Post by EvolveWebHosting » Wed Jul 14, 2021 9:11 am

This isn't 'official' but I think it's a good start for you:

User-agent: *
Disallow: /cgi-bin/
Disallow: /vqmod/
Disallow: /system/
Disallow: /admin/
Disallow: /*?page=$
Disallow: /*&page=$
Disallow: /*?sort=
Disallow: /*&sort=
Disallow: /*?order=
Disallow: /*&order=
Disallow: /*?limit=
Disallow: /*&limit=
Disallow: /*?filter_name=
Disallow: /*&filter_name=
Disallow: /*?filter_sub_category=
Disallow: /*&filter_sub_category=
Disallow: /*?filter_description=
Disallow: /*&filter_description=

If you aren't using vqmod then you can delete Disallow: /vqmod/

Personally, I've had better luck using meta noindex tags within .htaccess files to block search engines. It seems like a lot of them ignore the robots.txt directives.

Opencart Hosting Plans, Domain Registration, Microsoft and Google Email and More
Visit our website for great deals and most importantly, fast and friendly support - www.evolvewebhost.com


User avatar
Active Member

Posts

Joined
Fri Mar 27, 2015 11:13 pm
Location - Denver, Colorado, USA

Post by CaptainHaddock » Wed Jul 14, 2021 12:48 pm

Thanks for your helpful replies.
I alos used to have these lines in my Opencart 1.5 Robots.txt, is it worth including them in the OC3 robots.txt?
Disallow: /*?route=account/
Disallow: /*?route=affiliate/
Disallow: /*?route=checkout/
Disallow: /*?route=product/search

Active Member

Posts

Joined
Tue Jul 02, 2013 7:01 am


Post by by mona » Wed Jul 14, 2021 3:47 pm

robots.txt, if obeyed, is to indicate which advertised urls search engines should not request in the first place, good to reduce SE traffic. Robots headers and tags are there to indicate what SEs should do with it after they have retrieved it.

So any links you have on noindex and nofollow you can better put into robots.txt as they have no value when retrieved anyway.

Paths like /admin/ and /cgi-bin/ are useless in robots.txt as you do not advertise those.


robots.txt:

Code: Select all

User-agent: *
Disallow: /*checkout/
Disallow: /*account/
Sitemap: https://YOUR-DOMAIN/index.php?route=extension/feed/google_sitemap
Sitemap: https://YOUR-DOMAIN/sitemap.xml
Host: YOUR-DOMAIN


For the robots headers and tags:

in catalog/controller/common/header.php

before:

Code: Select all

return $this->load->view('common/header', $data);
add:

Code: Select all

// index and follow headers and tags
		$data['robots'] = false;
		if (
			isset($this->request->get['search']) ||
			isset($this->request->get['order']) ||
			isset($this->request->get['sort']) ||
			isset($this->request->get['limit']) ||
			isset($this->request->get['page']) ||
			isset($this->request->get['start'])
			) {
			header("X-Robots-Tag: noindex, follow", true);
			$data['robots'] = '<meta name="robots" content="noindex, follow">';
		} elseif (
			isset($this->request->get['route']) && 
			(
			stristr($this->request->get['route'],'error')
			)
			) {
			header("X-Robots-Tag: noindex, follow", true);
			$data['robots'] = '<meta name="robots" content="noindex, follow">';
		} elseif (
			isset($this->request->get['route']) && 
			(
			stristr($this->request->get['route'],'checkout/') ||
			stristr($this->request->get['route'],'account/')
			)
			) {
			header("X-Robots-Tag: noindex, nofollow", true);
			$data['robots'] = '<meta name="robots" content="noindex, nofollow">';
		} elseif (
			isset($this->request->get['route']) && 
			(
			stristr($this->request->get['route'],'FUTURE_PLACEHOLDER')
			)
			) {
			header("X-Robots-Tag: index, nofollow, noarchive", true);
			$data['robots'] = '<meta name="robots" content="index, nofollow, noarchive">';
		} else {
			header("X-Robots-Tag: index, follow, noarchive", true);
			$data['robots'] = '<meta name="robots" content="index, follow, noarchive">';
		}


in catalog/view/theme/default/template/common/header.twig

after: <head>
add:

Code: Select all

{% if robots %}
	{{ robots }}
{% endif %}

DISCLAIMER:
You should not modify core files .. if you would like to donate a cup of coffee I will write it in a modification for you.


https://www.youtube.com/watch?v=zXIxDoCRc84


User avatar
Expert Member

Posts

Joined
Mon Jun 10, 2019 9:31 am

Post by fegdeed » Wed Jul 14, 2021 11:23 pm

by mona wrote:
Wed Jul 14, 2021 3:47 pm

For the robots headers and tags:

in catalog/controller/common/header.php

before:

Code: Select all

return $this->load->view('common/header', $data);
add:

Code: Select all

// index and follow headers and tags
		$data['robots'] = false;
		if (
			isset($this->request->get['search']) ||
			isset($this->request->get['order']) ||
			isset($this->request->get['sort']) ||
			isset($this->request->get['limit']) ||
			isset($this->request->get['page']) ||
			isset($this->request->get['start'])
			) {
			header("X-Robots-Tag: noindex, follow", true);
			$data['robots'] = '<meta name="robots" content="noindex, follow">';
		} elseif (
			isset($this->request->get['route']) && 
			(
			stristr($this->request->get['route'],'error')
			)
			) {
			header("X-Robots-Tag: noindex, follow", true);
			$data['robots'] = '<meta name="robots" content="noindex, follow">';
		} elseif (
			isset($this->request->get['route']) && 
			(
			stristr($this->request->get['route'],'checkout/') ||
			stristr($this->request->get['route'],'account/')
			)
			) {
			header("X-Robots-Tag: noindex, nofollow", true);
			$data['robots'] = '<meta name="robots" content="noindex, nofollow">';
		} elseif (
			isset($this->request->get['route']) && 
			(
			stristr($this->request->get['route'],'FUTURE_PLACEHOLDER')
			)
			) {
			header("X-Robots-Tag: index, nofollow, noarchive", true);
			$data['robots'] = '<meta name="robots" content="index, nofollow, noarchive">';
		} else {
			header("X-Robots-Tag: index, follow, noarchive", true);
			$data['robots'] = '<meta name="robots" content="index, follow, noarchive">';
		}


in catalog/view/theme/default/template/common/header.twig

after: <head>
add:

Code: Select all

{% if robots %}
	{{ robots }}
{% endif %}
@by mona Do you have anything related to Content Security Policy(CSP) and Subresource Integrity(SRI) ?

Image
Get a secure, fast, and reliable web hosting service from https://turnuphosting.com.


Active Member

Posts

Joined
Fri Sep 21, 2018 12:01 am

Post by CaptainHaddock » Fri Jul 16, 2021 9:13 am

Mona, thanks for for your helpful reply :)

Active Member

Posts

Joined
Tue Jul 02, 2013 7:01 am


Post by Seer.Domains » Thu Sep 21, 2023 12:41 am

by mona wrote:
Wed Jul 14, 2021 3:47 pm
robots.txt, if obeyed, is to indicate which advertised urls search engines should not request in the first place, good to reduce SE traffic. Robots headers and tags are there to indicate what SEs should do with it after they have retrieved it.

So any links you have on noindex and nofollow you can better put into robots.txt as they have no value when retrieved anyway.

Paths like /admin/ and /cgi-bin/ are useless in robots.txt as you do not advertise those.


robots.txt:

Code: Select all

User-agent: *
Disallow: /*checkout/
Disallow: /*account/
Sitemap: https://YOUR-DOMAIN/index.php?route=extension/feed/google_sitemap
Sitemap: https://YOUR-DOMAIN/sitemap.xml
Host: YOUR-DOMAIN


For the robots headers and tags:

in catalog/controller/common/header.php

before:

Code: Select all

return $this->load->view('common/header', $data);
add:

Code: Select all

// index and follow headers and tags
		$data['robots'] = false;
		if (
			isset($this->request->get['search']) ||
			isset($this->request->get['order']) ||
			isset($this->request->get['sort']) ||
			isset($this->request->get['limit']) ||
			isset($this->request->get['page']) ||
			isset($this->request->get['start'])
			) {
			header("X-Robots-Tag: noindex, follow", true);
			$data['robots'] = '<meta name="robots" content="noindex, follow">';
		} elseif (
			isset($this->request->get['route']) && 
			(
			stristr($this->request->get['route'],'error')
			)
			) {
			header("X-Robots-Tag: noindex, follow", true);
			$data['robots'] = '<meta name="robots" content="noindex, follow">';
		} elseif (
			isset($this->request->get['route']) && 
			(
			stristr($this->request->get['route'],'checkout/') ||
			stristr($this->request->get['route'],'account/')
			)
			) {
			header("X-Robots-Tag: noindex, nofollow", true);
			$data['robots'] = '<meta name="robots" content="noindex, nofollow">';
		} elseif (
			isset($this->request->get['route']) && 
			(
			stristr($this->request->get['route'],'FUTURE_PLACEHOLDER')
			)
			) {
			header("X-Robots-Tag: index, nofollow, noarchive", true);
			$data['robots'] = '<meta name="robots" content="index, nofollow, noarchive">';
		} else {
			header("X-Robots-Tag: index, follow, noarchive", true);
			$data['robots'] = '<meta name="robots" content="index, follow, noarchive">';
		}


in catalog/view/theme/default/template/common/header.twig

after: <head>
add:

Code: Select all

{% if robots %}
	{{ robots }}
{% endif %}
Thanks, by mona!
Will this work for 3.0.3.8?

User avatar
Newbie

Posts

Joined
Thu Mar 18, 2021 4:17 am

Post by paulfeakins » Thu Sep 21, 2023 6:54 pm

ADD Creative wrote:
Wed Jul 14, 2021 7:25 am
2. It's just another bug that seems to have made it into the latest release.
Still no User Agent here:
https://github.com/opencart/opencart/bl ... robots.txt

UK OpenCart Hosting | OpenCart Audits | OpenCart Support - please email info@antropy.co.uk


User avatar
Guru Member
Online

Posts

Joined
Mon Aug 22, 2011 11:01 pm
Location - London Gatwick, United Kingdom

Post by ADD Creative » Thu Sep 21, 2023 8:04 pm

paulfeakins wrote:
Thu Sep 21, 2023 6:54 pm
Still no User Agent here:
https://github.com/opencart/opencart/bl ... robots.txt
Probably wants all removing anyway, as will likely cause "Indexed, though blocked by robots.txt" issues. Blocking page could cause issues with products that don't appear in the first page of category.

www.add-creative.co.uk


Expert Member

Posts

Joined
Sat Jan 14, 2012 1:02 am
Location - United Kingdom

Post by paulfeakins » Fri Sep 22, 2023 6:53 pm

ADD Creative wrote:
Thu Sep 21, 2023 8:04 pm
paulfeakins wrote:
Thu Sep 21, 2023 6:54 pm
Still no User Agent here:
https://github.com/opencart/opencart/bl ... robots.txt
Probably wants all removing anyway, as will likely cause "Indexed, though blocked by robots.txt" issues. Blocking page could cause issues with products that don't appear in the first page of category.
Could it be worth raising it as an issue on github?

UK OpenCart Hosting | OpenCart Audits | OpenCart Support - please email info@antropy.co.uk


User avatar
Guru Member
Online

Posts

Joined
Mon Aug 22, 2011 11:01 pm
Location - London Gatwick, United Kingdom

Post by ADD Creative » Fri Sep 22, 2023 7:12 pm

paulfeakins wrote:
Fri Sep 22, 2023 6:53 pm
ADD Creative wrote:
Thu Sep 21, 2023 8:04 pm
paulfeakins wrote:
Thu Sep 21, 2023 6:54 pm
Still no User Agent here:
https://github.com/opencart/opencart/bl ... robots.txt
Probably wants all removing anyway, as will likely cause "Indexed, though blocked by robots.txt" issues. Blocking page could cause issues with products that don't appear in the first page of category.
Could it be worth raising it as an issue on github?
I did point out some potential problems when the robots.txt was first added. So I guess the developers disagreed.
https://github.com/opencart/opencart/is ... -662372780

www.add-creative.co.uk


Expert Member

Posts

Joined
Sat Jan 14, 2012 1:02 am
Location - United Kingdom

Post by JNeuhoff » Fri Sep 22, 2023 7:33 pm

Also bear in mind that there are many bad web crawlers and bots who don't respect the robots.txt. We usually keep bad bots away via suitable instructions in the .htaccess file.

Export/Import Tool * SpamBot Buster * Unused Images Manager * Instant Option Price Calculator * Number Option * Google Tag Manager * Survey Plus * OpenTwig


User avatar
Guru Member
Online

Posts

Joined
Wed Dec 05, 2007 3:38 am


Post by paulfeakins » Mon Sep 25, 2023 7:27 pm

ADD Creative wrote:
Fri Sep 22, 2023 7:12 pm
I did point out some potential problems when the robots.txt was first added. So I guess the developers disagreed.
https://github.com/opencart/opencart/is ... -662372780
That's not the missing user agent problem though.

UK OpenCart Hosting | OpenCart Audits | OpenCart Support - please email info@antropy.co.uk


User avatar
Guru Member
Online

Posts

Joined
Mon Aug 22, 2011 11:01 pm
Location - London Gatwick, United Kingdom

Post by ADD Creative » Mon Sep 25, 2023 7:59 pm

paulfeakins wrote:
Mon Sep 25, 2023 7:27 pm
ADD Creative wrote:
Fri Sep 22, 2023 7:12 pm
I did point out some potential problems when the robots.txt was first added. So I guess the developers disagreed.
https://github.com/opencart/opencart/is ... -662372780
That's not the missing user agent problem though.
Sorry, you replied to my post about the robot.txt file being wrong regardless of whether it had a user agent or not, so I thought you were talking about that.

www.add-creative.co.uk


Expert Member

Posts

Joined
Sat Jan 14, 2012 1:02 am
Location - United Kingdom

Post by paulfeakins » Tue Sep 26, 2023 11:41 pm

ADD Creative wrote:
Mon Sep 25, 2023 7:59 pm
Sorry, you replied to my post about the robot.txt file being wrong regardless of whether it had a user agent or not, so I thought you were talking about that.
Sorry if I wasn't clear.

UK OpenCart Hosting | OpenCart Audits | OpenCart Support - please email info@antropy.co.uk


User avatar
Guru Member
Online

Posts

Joined
Mon Aug 22, 2011 11:01 pm
Location - London Gatwick, United Kingdom
Who is online

Users browsing this forum: grgr, lockbox, nonnedelectari, Semrush [Bot] and 88 guests