Scipio always does a redirect... Bad for google


#1

This was driving my crazy, then I analyzed the headers. It appears that when requesting ANY URL, scipio always does a redirect, including your demo site. Example:

Stock OFBIZ:

wget -H https://demo-stable.ofbiz.apache.org/ecommerce/dropship1-dropShip1-p
–2017-11-10 19:40:42-- https://demo-stable.ofbiz.apache.org/ecommerce/dropship1-dropShip1-p
Resolving demo-stable.ofbiz.apache.org (demo-stable.ofbiz.apache.org)… 37.48.69.245
Connecting to demo-stable.ofbiz.apache.org (demo-stable.ofbiz.apache.org)|37.48.69.245|:443… connected.
HTTP request sent, awaiting response… 200 OK
Length: unspecified [text/html]
Saving to: ‘dropship1-dropShip1-p’

No redirects… Now scipio:

I even tried “/shop/control/main”…

wget -H https://ce.scipioerp.com/shop/control/main
–2017-11-10 19:54:35-- https://ce.scipioerp.com/shop/control/main
Resolving ce.scipioerp.com (ce.scipioerp.com)… 78.47.214.219
Connecting to ce.scipioerp.com (ce.scipioerp.com)|78.47.214.219|:443… connected.
HTTP request sent, awaiting response… 301 Moved Permanently
Location: https://ce.scipioerp.com/shop/control/main [following]
–2017-11-10 19:54:36-- https://ce.scipioerp.com/shop/control/main
Reusing existing connection to ce.scipioerp.com:443.
HTTP request sent, awaiting response… 200 OK
Length: unspecified [text/html]
Saving to: ‘main’

Now a “product”

wget -H https://ce.scipioerp.com/shop/spinach-FD-1016-p
–2017-11-10 19:44:54-- https://ce.scipioerp.com/shop/spinach-FD-1016-p
Resolving ce.scipioerp.com (ce.scipioerp.com)… 78.47.214.219
Connecting to ce.scipioerp.com (ce.scipioerp.com)|78.47.214.219|:443… connected.
HTTP request sent, awaiting response… 301 Moved Permanently
Location: https://ce.scipioerp.com/shop/control/product [following]
–2017-11-10 19:44:55-- https://ce.scipioerp.com/shop/control/product
Reusing existing connection to ce.scipioerp.com:443.
HTTP request sent, awaiting response… 200 OK
Length: unspecified [text/html]
Saving to: ‘spinach-FD-1016-p.1’

NOTE: “HTTP request sent, awaiting response… 301 Moved Permanently”. It fact, for a product, it redirects to a non-existent URL:

https://ce.scipioerp.com/shop/control/product [BAD URL]

Which, if you call this URL directly, it returns “Product not found for Product ID !”

Why is this bad? Now try each of these URLs into googles structured testing tool:

https://demo-stable.ofbiz.apache.org/ecommerce/dropship1-dropShip1-p
https://ce.scipioerp.com/shop/spinach-FD-1016-p

https://search.google.com/structured-data/testing-tool

Guess which one works and which one fails due to the redirects. This will absolutely prevent your site from being indexed (properly) by google.


#2

As long as cookies are enabled, that only happens on the very first secure HTTPS request. (wget doesn’t resend the cookies automatically)


#3

Edit: You’re right that the redirect to product is unusual. Will have to check


#4

I’m not able to answer more tonight, but I can tell you that the product redirect happens because of this code in RequestHandler.java (stock):

        // if this is a new session and forceHttpSession is true and the request is secure but does not
        // need to be then we need the session cookie to be created via an http response (rather than https)
        // so we'll redirect to an unsecure request
        } else if (forceHttpSession && request.isSecure() && session.isNew() && !requestMap.securityHttps) {
            StringBuilder urlBuf = new StringBuilder();
            urlBuf.append(request.getPathInfo());
            if (request.getQueryString() != null) {
                urlBuf.append("?").append(request.getQueryString());
            }
            // SCIPIO: Call proper method for this
            //String newUrl = RequestHandler.makeUrl(request, response, urlBuf.toString(), true, false, false);
            String newUrl = RequestHandler.makeUrlFull(request, response, urlBuf.toString());
            if (newUrl.toUpperCase().startsWith("HTTP")) {
                callRedirect(newUrl, response, request, statusCodeString);
                return;
            }
        }

There’s 2 different issues involved in that. But the comment there gives you the original reason why the redirect itself was coded to happen.

as far as I can see, this code was removed in later Ofbiz. I don’t know why at the moment, so we will have to find the ticket for the justification and to see if anything else accompanied that change that’s important.


#5

Here is a clue: Without https, no redirect:

wget -H http://ce.scipioerp.com/shop/spinach-FD-1016-p
–2017-11-10 22:36:26-- http://ce.scipioerp.com/shop/spinach-FD-1016-p
Resolving ce.scipioerp.com (ce.scipioerp.com)… 78.47.214.219
Connecting to ce.scipioerp.com (ce.scipioerp.com)|78.47.214.219|:80… connected.
HTTP request sent, awaiting response… 200 OK
Length: unspecified [text/html]
Saving to: ‘spinach-FD-1016-p.5’

With scipio: https, redirect, even when it shouldn’t need to (i.e. /shop/control/main)

wget -H https://ce.scipioerp.com/shop/control/main
–2017-11-10 22:40:58-- https://ce.scipioerp.com/shop/control/main
Resolving ce.scipioerp.com (ce.scipioerp.com)… 78.47.214.219
Connecting to ce.scipioerp.com (ce.scipioerp.com)|78.47.214.219|:443… connected.
HTTP request sent, awaiting response… 301 Moved Permanently
Location: https://ce.scipioerp.com/shop/control/main [following]
–2017-11-10 22:40:59-- https://ce.scipioerp.com/shop/control/main
Reusing existing connection to ce.scipioerp.com:443.
HTTP request sent, awaiting response… 200 OK
Length: unspecified [text/html]
Saving to: ‘main’

Thanks for looking into it.


#6

I commented out that section of code in framework/webapp/src/org/ofbiz/webapp/control/RequestHandler.java that was inserted by scipio to make it match stock ofbiz, and seems to cure this redirect issue.

The redirect to “/shop/control/product” does not happen anymore. When I look at the apache logs to see how google is hitting the system, it is confirmed that google now only hits once:

66.249.83.151 - - [11/Nov/2017:10:23:34 -0800] “GET /gallery/101-bunny-pets-virtual-pet-game-ko-10000-798936838384-p HTTP/1.1” 200 19525 “-” “Mozilla/5.0 (compatible; Google-Structured-Data-Testing-Tool +https://search.google.com/structured-data/testing-tool)”

Just once (good), returning 200.

Before, it would do this:

66.249.83.81 - - [06/Nov/2017:15:59:10 -0800] “GET /gallery/1001-minigolf-challenge-en-10000-798936836182-p HTTP/1.1” 301 3718 “-” "Mozilla/5.0 (compatible; Google-Structured-Data-Testing-Tool +https://search.google.com/structured-data/testing-tool)"
66.249.83.80 - - [06/Nov/2017:15:59:11 -0800] “GET /gallery/control/product HTTP/1.1” 301 3718 “-” "Mozilla/5.0 (compatible; Google-Structured-Data-Testing-Tool +https://search.google.com/structured-data/testing-tool)"
66.249.83.82 - - [06/Nov/2017:15:59:11 -0800] “GET /gallery/control/product HTTP/1.1” 301 3718 “-” "Mozilla/5.0 (compatible; Google-Structured-Data-Testing-Tool +https://search.google.com/structured-data/testing-tool)"
66.249.83.82 - - [06/Nov/2017:15:59:11 -0800] “GET /gallery/control/product HTTP/1.1” 301 3718 “-” "Mozilla/5.0 (compatible; Google-Structured-Data-Testing-Tool +https://search.google.com/structured-data/testing-tool)"
66.249.83.81 - - [06/Nov/2017:15:59:12 -0800] “GET /gallery/control/product HTTP/1.1” 301 3718 “-” "Mozilla/5.0 (compatible; Google-Structured-Data-Testing-Tool +https://search.google.com/structured-data/testing-tool)"
66.249.83.81 - - [06/Nov/2017:15:59:12 -0800] “GET /gallery/control/product HTTP/1.1” 301 3718 “-” “Mozilla/5.0 (compatible; Google-Structured-Data-Testing-Tool +https://search.google.com/structured-data/testing-tool)”

Once, redirect with 5 tries, failed.


Catalog browsing inconsistent, even on your demo
#7

Thanks mike.

Took me awhile to pinpoint where this changed (due to the ofbiz repo changes), but it was part of a bigger edit where they tried to force everything to HTTPS (ofb 16):
https://issues.apache.org/jira/browse/OFBIZ-6849
https://issues.apache.org/jira/browse/OFBIZ-6879

In general in Scipio we already promoted full HTTPS and in that case it will be safe to remove the legacy redirect (you could do this for your site while we sort it out for master - NOTE: the original code was there for a valid reason however, and if you don’t force HTTPS through your frontend/controller you could potentially have an issue where the http/https switch results in unexpected logouts or cart loss - this is part of the considerations). There are possible implications however that we will have to review.


#8

There seems to be two problems. One, the unnecessary redirect, and tweaking RequestHandler.java fixed that problem.

The second seems to be WHEN redirect is needed, it erroneously redirects to /shop/control/product. I have seen this behavior after incorporating the above fix. For instance, if a product is not found in the cache (even though it does exist, another issue), the will redirect to /shop/control/product and cause the same redirect error. So, there are still issues lurking. In your demo shop, I suspect you have all caching turned on, so these inconsistent issues are not turning up as often as development mode (caching expires after 1 minute).


#9

@mz4wheeler Just so you are in aware, I forgot about this old setting, you do not need to comment out the redirect (my bad). Instead you set this setting to false in the shop web.xml (this will prevent git conflict once things are changed):

<context-param>
    <description>
        Forces the JSESSIONID cookie to be sent via http rather https, helps prevent lost sessions in web apps that frequently switch between http and https.
        </description>
    <param-name>forceHttpSession</param-name>
    <param-value>false</param-value>
</context-param>

I forgot that this setting actually controls the redirect and not something else.

(the same caveats from the note above apply)

Most likely we will not change the setting to false in trunk for the time being, because of the caveats, but rather fix some of the code.

And indeed as noted in the other thread so there is a two-part problem specific to the product URLs.


#10

Great. I think the browser was tolerant of the behind the scenes redirect that was going on, but not google. So, I’m glad this was fixed. This is related to SEO stuff, which I do have more for you (later). [grin]