Post by imager » Tue Apr 06, 2021 2:28 am

Looking for advice on an issue that is plaguing my site.

It would appear that malicious visitors are forcing invalid URLs on my site, which seems to be resulting in high host CPU usage (pinned at 99 to 100%).

OpenCart 3.0.2.0 seems to be taking all page requests (including 404 pages) and processing them, in the case of 404, displaying "The page you requested cannot be found!" message.

I would like to offload (for now) the 404 processing to a static page (blank.html) in the root of the site.

I updated the .htaccess file with a ErrorDocument 404 /blank.html, but that does not appear to be doing anything - OpenCart is still handling the pages.

Suspect that the .htaccess file has syntax which is preventing the processing of the ErrrorDocument line. Below is the .htaccess file

Could anyone confirm if it is possible to NOT have OpenCart 3.0.2.0 process 404 pages, and is my implementation in the .htaccess correct??

Thanks...

Code: Select all

<IfModule mod_rewrite.c>
ErrorDocument 503 "Your connection was refused"
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} MJ12bot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} YandexBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} YandexImages [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Baiduspider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} HTTrack [NC,OR]
RewriteCond %{HTTP_USER_AGENT} AhrefsBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} DotBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} BLEXBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} SeznamBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} SemrushBot [NC]
RewriteRule .* - [R=503,L]
</IfModule>




# Redirect to www RewriteCond
RewriteCond %{HTTP_HOST} ^[^.]+\.[^.]+$
RewriteCond %{HTTPS}s ^on(s)|
RewriteRule ^ http%1://www.%{HTTP_HOST}%{REQUEST_URI} [L,R=301]




<IfModule mod_deflate.c>
    AddOutputFilterByType DEFLATE text/html text/css text/javascript application/javascript
</IfModule>

# 1.To use URL Alias you need to be running apache with mod_rewrite enabled.

# 2. In your opencart directory rename htaccess.txt to .htaccess.

# For any support issues please visit: http://www.opencart.com

Options +FollowSymlinks

# Prevent Directoy listing
Options -Indexes

# Prevent Direct Access to files
<FilesMatch "(?i)((\.tpl|.twig|\.ini|\.log|(?<!robots)\.txt))">
 Require all denied
## For apache 2.2 and older, replace "Require all denied" with these two lines :
# Order deny,allow
# Deny from all
</FilesMatch>

# SEO URL Settings
RewriteEngine On
# If your opencart installation does not run on the main web folder make sure you folder it does run in ie. / becomes /shop/


ErrorDocument 404 /blank.html
RewriteBase /
RewriteRule ^sitemap.xml$ index.php?route=extension/feed/google_sitemap [L]
RewriteRule ^googlebase.xml$ index.php?route=extension/feed/google_base [L]
RewriteRule ^system/storage/(.*) index.php?route=error/not_found [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} !.*\.(ico|gif|jpg|jpeg|png|js|css)
RewriteRule ^([^?]*) index.php?_route_=$1 [L,QSA]


# Leverage Browser Caching
<IfModule mod_expires.c>
 ExpiresActive On
 ExpiresByType image/jpg "access plus 1 week"
 ExpiresByType image/jpeg "access plus 1 week"
 ExpiresByType image/gif "access plus 1 week"
 ExpiresByType image/png "access plus 1 week"
 ExpiresByType text/css "access plus 1 week"
 ExpiresByType application/pdf "access plus 1 week"
 ExpiresByType text/x-javascript "access plus 1 week"
 ExpiresByType image/x-icon "access plus 1 week"
 ExpiresDefault "access plus 1 week"
</IfModule>

# Cache Clear Out: https://forum.opencart.com/viewtopic.php?t=206449
<IfModule mod_headers.c>
<filesMatch "\.(htm|html|css|js|php|tag)$">
Header set Cache-Control "no-cache, no-store, must-revalidate"
</filesMatch>
</IfModule>

### Additional Settings that may need to be enabled for some servers
### Uncomment the commands by removing the # sign in front of it.
### If you get an "Internal Server Error 500" after enabling any of the following settings, restore the # as this means your host doesn't allow that.

# 1. If your cart only allows you to add one item at a time, it is possible register_globals is on. This may work to disable it:
# php_flag register_globals off

# 2. If your cart has magic quotes enabled, This may work to disable it:
# php_flag magic_quotes_gpc Off

# 3. Set max upload file size. Most hosts will limit this and not allow it to be overridden but you can try
# php_value upload_max_filesize 999M

# 4. set max post size. uncomment this line if you have a lot of product options or are getting errors where forms are not saving all fields
# php_value post_max_size 999M

# 5. set max time script can take. uncomment this line if you have a lot of product options or are getting errors where forms are not saving all fields
# php_value max_execution_time 200

# 6. set max time for input to be recieved. Uncomment this line if you have a lot of product options or are getting errors where forms are not saving all fields
# php_value max_input_time 200

# 7. disable open_basedir limitations
# php_admin_value open_basedir none

# php -- BEGIN cPanel-generated handler, do not edit
# Set the “ea-php73” package as the default “PHP” programming language.
<IfModule mime_module>
  AddHandler application/x-httpd-ea-php73 .php .php7 .phtml
</IfModule>
# php -- END cPanel-generated handler, do not edit

New member

Posts

Joined
Fri Nov 09, 2012 7:05 pm

Post by paulfeakins » Tue Apr 06, 2021 6:30 pm

imager wrote:
Tue Apr 06, 2021 2:28 am
I would like to offload (for now) the 404 processing to a static page (blank.html) in the root of the site.
That's a question for your host.

UK OpenCart Hosting | OpenCart Audits | OpenCart Support - please email info@antropy.co.uk


User avatar
Guru Member
Online

Posts

Joined
Mon Aug 22, 2011 11:01 pm
Location - London Gatwick, United Kingdom

Post by imager » Tue Apr 06, 2021 7:03 pm

Paul.
The problem is the 404 handling in the .htaccess file never seems to be "seen". It would appear that no matter what, OpenCart is managing these.

Could it be the positioning of the ErrorDocument 404 /blank.html statement in the .htaccess that is preventing it from running? Or, are you suggesting that the host has turned off the ability for ALL ErrorDocument processing and no matter what I do it would NEVER be processed?

John

New member

Posts

Joined
Fri Nov 09, 2012 7:05 pm

Post by ADD Creative » Tue Apr 06, 2021 7:15 pm

As you can see from this section of htaccess, anything that is not a static file or directory is routed through index.php. This need to be done to handle the SEO URLs.

Code: Select all

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} !.*\.(ico|gif|jpg|jpeg|png|js|css)
RewriteRule ^([^?]*) index.php?_route_=$1 [L,QSA]
I suspect not having SEO URLs isn't an option. So you would have to work out which URLs should be a 404 and filter them before the RewriteRule.

This section is awful. You are telling web browsers not to cache static css and js files. That will increase server load. See my post in that thread how to set the cache control headers a different way.

Code: Select all

# Cache Clear Out: https://forum.opencart.com/viewtopic.php?t=206449
<IfModule mod_headers.c>
<filesMatch "\.(htm|html|css|js|php|tag)$">
Header set Cache-Control "no-cache, no-store, must-revalidate"
</filesMatch>
</IfModule>

www.add-creative.co.uk


Expert Member

Posts

Joined
Sat Jan 14, 2012 1:02 am
Location - United Kingdom

Post by by mona » Tue Apr 06, 2021 7:58 pm

Code: Select all

<IfModule mod_rewrite.c>
ErrorDocument 503 "Your connection was refused"
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} MJ12bot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} YandexBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} YandexImages [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Baiduspider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} HTTrack [NC,OR]
RewriteCond %{HTTP_USER_AGENT} AhrefsBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} DotBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} BLEXBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} SeznamBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} SemrushBot [NC]
RewriteRule .* - [R=503,L]
</IfModule>
These are not malicious visitors but legitimate public bots which will not request urls you do/did not advertise.
You can block them if you want but 503 means "service unavailable" not "connection refused" and bots generally do not read content of 4xx or 5xx documents, they know what the codes mean so why set an ErrorDocument at all.
There are a lot more covert bots (not identifying themselves via the user agent) which do request non-existent urls to simply probe what you have or might be running.
I guarantee you that approx. 70% of your traffic will come from those (and you thought you had that many visitors...).
Those are your main challenge though not all of those are malicious (i.e. security companies, law enforcement, census, even search engines), these public bots are peanuts which you can also put in your robots.txt.
What invalid urls are you talking about?

If you want to reduce the amount of unnecessary processing done by opencart SEO url processing, extend the exclusions of:

Code: Select all

RewriteCond %{REQUEST_URI} !.*\.(ico|gif|jpg|jpeg|png|js|css)
RewriteRule ^([^?]*) index.php?_route_=$1 [L,QSA]

to

Code: Select all

RewriteCond %{REQUEST_URI} !.*\.(env|ashx|cfg|dat|ico|cur|txt|mp3|webp|svg|ttf|eot|woff|woff2|gif|jpg|JPG|jpeg|JPEG|png|js|cfg|css|pdf|zip|env|tar|sql|gz|tar|exe|rar|arj|cab|iso|rpm|tbz|tgz|old|bak|backup|dump|db|7z|asp|aspx|exp|html|htm)$
RewriteRule ^([^?]*) index.php?_route_=$1 [PT,QSA]
in other words, add extensions which you do not have but for which probe bots will look.
or as we do it, covering all possible static asset extensions:

Code: Select all

	# any virtual url (not found in filesystem) which does not contain a dot (seo urls should NOT contain a dot)
	RewriteCond %{REQUEST_URI} !.*\..*$
	RewriteCond %{REQUEST_FILENAME} !-f
	RewriteCond %{REQUEST_FILENAME} !-d
	RewriteRule ^([^?]*) /co/index.php?_route_=$1 [PT,QSA]

Most probes will request assets like /your-domain.zip or /domain.sql or /website.bak, etc.
You will not publicly have those assets so OC will assume they are seo urls and start that process ultimately resulting in a fancy 404 page.
All that processing is for nothing, that is the price of using seo urls unless you limit the amount of urls that go to that process.

In short:
a request for /website.bak or any other file with an extension which you do not have...

- In default OC that would go to OC seo url processing, php, sql, nice 404 page

- With the changes above, the webserver itself would generate a basic 404 page, no php, no sql, no nice 404 page.


Code: Select all

ExpiresDefault "access plus 1 week"
This is bad as it defines a 1 week browser cache for every type of content you do not specify, that includes html in your case.

Code: Select all

# Cache Clear Out: https://forum.opencart.com/viewtopic.php?t=206449
<IfModule mod_headers.c>
<filesMatch "\.(htm|html|css|js|php|tag)$">
Header set Cache-Control "no-cache, no-store, must-revalidate"
</filesMatch>
</IfModule>
Then you do this which is even worse as it invalidates browser caching for css and javascript static assets.
And filematch on extensions is not the same as content type matching so this also does not correct your html caching mistake above.

other than that, it looks great.

DISCLAIMER:
You should not modify core files .. if you would like to donate a cup of coffee I will write it in a modification for you.


https://www.youtube.com/watch?v=zXIxDoCRc84


User avatar
Expert Member

Posts

Joined
Mon Jun 10, 2019 9:31 am

Post by ADD Creative » Tue Apr 06, 2021 10:38 pm

Have to agree with mona. Sending a 503 reply to those bot is only likely to cause you more issues, as if anything, they will likely retry the page again later. Most of them will obey robots.txt so would be best to control them there. At the very least return a 403 Forbidden.

www.add-creative.co.uk


Expert Member

Posts

Joined
Sat Jan 14, 2012 1:02 am
Location - United Kingdom
Who is online

Users browsing this forum: nonnedelectari and 404 guests