General Discussion is the text corruption on the front page being investigated?

View : : :
18 Replies. 1 pages. Viewing page 1.
Newer [  1  ] Older
18.
 
Re: is the text corruption on the front page being investigated?
Feb 11, 2021, 03:16
18.
Re: is the text corruption on the front page being investigated? Feb 11, 2021, 03:16
Feb 11, 2021, 03:16
 
eRe4s3r wrote on Feb 10, 2021, 22:28:
Did you declare @charset "utf-8"; in the primary CSS and *SAVE* that .css file encoded in UTF-8 (with notepad++ or some other file editor that allows specifying..) ?
Since the style sheet contains no non-ASCII characters, that doesn't have any effect, but it doesn't hurt either.

eRe4s3r wrote on Feb 10, 2021, 22:28:
I dunno which CSS is loaded for NewBlue,
Hit Ctrl-U and look at line 12: <link href="/css/styles.css"...> Click it to view its contents.
-- Frans
Avatar 1258
17.
 
Re: is the text corruption on the front page being investigated?
Feb 10, 2021, 22:28
17.
Re: is the text corruption on the front page being investigated? Feb 10, 2021, 22:28
Feb 10, 2021, 22:28
 
Random thought of the day (might be completely irrelevant, this is beyond my skillset)

Did you declare @charset "utf-8"; in the primary CSS and *SAVE* that .css file encoded in UTF-8 (with notepad++ or some other file editor that allows specifying..) ?

I dunno which CSS is loaded for NewBlue, but maybe @charset "utf-8"; alone would be enough to fix it.... oh well, if not then disregard
Avatar 54727
16.
 
Re: is the text corruption on the front page being investigated?
Feb 10, 2021, 14:58
16.
Re: is the text corruption on the front page being investigated? Feb 10, 2021, 14:58
Feb 10, 2021, 14:58
 
fds wrote on Feb 10, 2021, 08:41:
Which version of MySQL is it? Older versions of MySQL, and all versions of MariaDB, required the use of their specially named "utf8mb4" character set instead for true utf-8, one that also supports 4-byte characters such as all emojis.
5.7, and yes I'm aware of the mb4 charset situation, but was hoping to avoid that. And the eacute is 2-bytes so I'd think the mb4 angle wouldn't explain the problem.

But the OS is due for a major LTS upgrade anyway, so I have some small hope that that might sort out the problem as yet, if it isn't a coding bug in our scripts.

fds wrote on Feb 10, 2021, 08:41:
Currently for me, your #2 link consistently serves the title of the story as Cyberpunk 2077 Expos0xE9, which is invalid utf-8, and as such, consistently shows broken. Both in the h2 posting title, and the top-level title tag which then ends up as the tab's title.
How did you determine the received byte(s), packet sniffing?

fds wrote on Feb 10, 2021, 08:41:
However, the issue is only on this /s/ "Share" link, which I gather was statically generated one time.
No, the only truly static page is the old HTML archive index (and 3 more that aren't relevant here). And as noted, the frontpage is a static .html file served via the switch script.

Share links have been shortened for SEO purposes, but are executed (via Apache rewrite) by the board.pl script, action viewstory. This shows replacement characters for me as well.

fds wrote on Feb 10, 2021, 08:41:
If I click over to the comment viewer board.pl version, it is always correct. It's served as a proper Cyberpunk 2077 Expos0xC3A9
For title element and story header yes, but most comment subjects show ?'s for me. Although those come out of a separate table/column, and were copied from the news story header when that was still an incorrect 1-byte Latin1 version of eacute, thus the ?'s are to be expected.

fds wrote on Feb 10, 2021, 08:41:
If I had to guess, not all statically generated pages got refreshed after you have fixed whatever was borked in the perl settings earlier, and some corrupted static pages remain.
No, that's not it then.

But thanks for helping to investigate.
-- Frans
Avatar 1258
15.
 
Re: is the text corruption on the front page being investigated?
Feb 10, 2021, 14:05
15.
Re: is the text corruption on the front page being investigated? Feb 10, 2021, 14:05
Feb 10, 2021, 14:05
 
Frans wrote on Feb 6, 2021, 05:29:
I was going to add one more weird observation about the replacement characters, but since last night I'm not getting them anymore during dozens of refreshes in multiple browsers. So perhaps the issue resolved itself just as mysteriously as it popped up... Wacky
Wishful thinking... the replacement mark returned for me too (some days ago but I didn't have time for this until now).

The earlier observation is that in a browser's dev tools (F12) in the Network panel, one can view the response headers and body of the GET request for the URL with the problem. In that response body, the accented e is indeed shown as such in <title> tag and news post header, even when the browser's main window renders it as the replacement character in those spots. But the dev tools renderer may behave differently from the main window in this respect, so I'm not drawing conclusions, it merely contributes to the weirdness of the problem.
-- Frans
Avatar 1258
14.
 
Re: is the text corruption on the front page being investigated?
Feb 10, 2021, 08:41
fds
14.
Re: is the text corruption on the front page being investigated? Feb 10, 2021, 08:41
Feb 10, 2021, 08:41
fds
 
Frans wrote on Feb 5, 2021, 14:50:
  • it gets stored in MySQL, which along with its tables and string columns are set to UTF-8 (utf8_unicode_ci)
Which version of MySQL is it? Older versions of MySQL, and all versions of MariaDB, required the use of their specially named "utf8mb4" character set instead for true utf-8, one that also supports 4-byte characters such as all emojis.

Frans wrote on Feb 5, 2021, 14:50:
Separately, we still have a problem with UTF-8 characters in single stories. In fact, the two Cyberpunk Exposé stories showing up in the popular threads box mid-January is how I first noticed it. And one of them is still erratic on the Share/Comments links:

Currently for me, your #2 link consistently serves the title of the story as Cyberpunk 2077 Expos0xE9, which is invalid utf-8, and as such, consistently shows broken. Both in the h2 posting title, and the top-level title tag which then ends up as the tab's title.

However, the issue is only on this /s/ "Share" link, which I gather was statically generated one time.

If I click over to the comment viewer board.pl version, it is always correct. It's served as a proper Cyberpunk 2077 Expos0xC3A9

If I had to guess, not all statically generated pages got refreshed after you have fixed whatever was borked in the perl settings earlier, and some corrupted static pages remain.

Thank you for all the work you do for Blue!
13.
 
Re: is the text corruption on the front page being investigated?
Feb 10, 2021, 07:27
13.
Re: is the text corruption on the front page being investigated? Feb 10, 2021, 07:27
Feb 10, 2021, 07:27
 
Frans wrote on Feb 6, 2021, 05:29:
Frans wrote on Feb 5, 2021, 14:50:
But I don't know the reason nor remedy for this yet; it's really weird that the browser only renders UTF-8 characters correctly and consistently if they're 3-byte ones (and then the 2-byte ones too), but not if a page includes only 2-byte characters, given that HTML and HTTPS instruct it to use that character set.

Yes, a workaround remedy may be to insure that the page always includes a 3-byte character somewhere, but I'd still like to understand the underlying problem if possible.
I was going to add one more weird observation about the replacement characters, but since last night I'm not getting them anymore during dozens of refreshes in multiple browsers. So perhaps the issue resolved itself just as mysteriously as it popped up... Wacky

Wackiness continues
Bluesnews bla la
Click on RedEye post -> Question mark in headline
Click on mine -> no Question mark in headline

If you switch between the posts it eventually breaks again too. Even if it sometimes remains fixed on RedEyes post, now THERE is a mystery.
Avatar 54727
12.
 
No subject
Feb 6, 2021, 08:27
12.
No subject Feb 6, 2021, 08:27
Feb 6, 2021, 08:27
 
Frans I'd give you a hug if it weren't for Covid19 social distancing guidelines and the thousands of miles between us.
Thank You

And a double thanks for explaining how the sausage was occasionally making � instead of links and patties.
- I refer to it as BC, Before Corona, and AD, After Disaster. -
Avatar 58135
11.
 
Re: is the text corruption on the front page being investigated?
Feb 6, 2021, 05:29
11.
Re: is the text corruption on the front page being investigated? Feb 6, 2021, 05:29
Feb 6, 2021, 05:29
 
Frans wrote on Feb 5, 2021, 14:50:
But I don't know the reason nor remedy for this yet; it's really weird that the browser only renders UTF-8 characters correctly and consistently if they're 3-byte ones (and then the 2-byte ones too), but not if a page includes only 2-byte characters, given that HTML and HTTPS instruct it to use that character set.

Yes, a workaround remedy may be to insure that the page always includes a 3-byte character somewhere, but I'd still like to understand the underlying problem if possible.
I was going to add one more weird observation about the replacement characters, but since last night I'm not getting them anymore during dozens of refreshes in multiple browsers. So perhaps the issue resolved itself just as mysteriously as it popped up... Wacky
-- Frans
Avatar 1258
10.
 
Re: is the text corruption on the front page being investigated?
Feb 5, 2021, 14:50
10.
Re: is the text corruption on the front page being investigated? Feb 5, 2021, 14:50
Feb 5, 2021, 14:50
 
This isn't about "Word formatted" characters but about all UTF-8 (non-ASCII) ones. The system has to process them as UTF-8 every step in a long chain:

  • Blue writes a story in Frontpage, which is configured to UTF-8
  • copies it into the Blammo admin script, which runs with Perl settings for UTF-8 processing
  • it gets stored in MySQL, which along with its tables and string columns are set to UTF-8 (utf8_unicode_ci)
  • the front-end script (with the same settings*) retrieves two days of news and writes the static .shtml files with UTF-8 characters
  • the switch script serves the .shtml page with or w/o ads**
  • alternatively, the board script (with the same settings*) does the same with news stories dynamically, for Share and Comment pages; ditto for the articles, logos, and other scripts
  • all generated HTML5 starts with meta charset="utf-8"
  • Apache sends all HTML over HTTPS to the browser with Content-Type: text/html; charset=utf-8
  • the browser should render the received bytes as UTF-8 characters, but doesn't always Confused

* Those I fixed early October, which I thought covered all bases, until the problem re-emerged three weeks ago.

** In an oversight, this script did not yet fully use Perl UTF-8 settings until this morning. So I believe the frontpage now correctly shows UTF-8 characters again. At least, in dozens of refreshes I haven't seen malformed "Word" quotes, ellipses, mdashes, etc. anymore. Please let me know if you catch them as yet.


Separately, we still have a problem with UTF-8 characters in single stories. In fact, the two Cyberpunk Exposé stories showing up in the popular threads box mid-January is how I first noticed it. And one of them is still erratic on the Share/Comments links:


For Blue and me, on story 1 the headline and browser title bar always show the accented e (eacute): é.
But on #2, they erratically flip between the é and the question mark on a diamond background, i.e. the replacement character �.

What I noticed as a possibly relevant difference between the two stories is that é is a 2-byte character (C3 A9), and while in story 1 multiple 3-byte characters occur (e.g. mdash —, E2 80 94), story 2 has none. Adding an mdash character to story 2 too (in testing) causes it to no longer show the replacement character in dozens of refreshes.

But I don't know the reason nor remedy for this yet; it's really weird that the browser only renders UTF-8 characters correctly and consistently if they're 3-byte ones (and then the 2-byte ones too), but not if a page includes only 2-byte characters, given that HTML and HTTPS instruct it to use that character set.

Yes, a workaround remedy may be to insure that the page always includes a 3-byte character somewhere, but I'd still like to understand the underlying problem if possible.

Btw, I hardly spent any time on this until yesterday, because I initially hit a Wall trying figure out what happened when, and because I was immersed in another programming project (non-BN, in fact I haven't done any real BN development since the smilies modal early November). But at least today there's some progress. Nice
-- Frans
Avatar 1258
9.
 
No subject
Feb 3, 2021, 14:43
9.
No subject Feb 3, 2021, 14:43
Feb 3, 2021, 14:43
 
Additional post in ootb.
- I refer to it as BC, Before Corona, and AD, After Disaster. -
Avatar 58135
8.
 
No subject
Feb 2, 2021, 23:10
8.
No subject Feb 2, 2021, 23:10
Feb 2, 2021, 23:10
 
Frans
There was more discussion about this in today's out of the blue.
Kxmode wrote on Feb 2, 2021, 17:25:
Blue wrote on Feb 2, 2021, 17:19:
The Half Elf wrote on Feb 2, 2021, 16:31:
Ok so having an issue that the website's text are garbelled, so using a chrome extenstion I have to set the site's encoding to UTF-8 (on the main page). Anyone else experiencing this or is it on my end?

This has popped up intermittently for a while now. Frans has put a lot of effort into tracking it down, but an answer continues to elude us. It's maddeningly inconsistent. As noted, it will randomly go away and/or return if you hit refresh, so it sort of defies logic what's going on.

I've seen this before. It looks like the quotes and hyphens in the database are Word formatted. The browser does its best to convert them into ASCII but may occasionally get stuck.

I don't know anything about databases or formatting, is this "Word formatted" stuff something you've investigated.
Can it even be changed?

Troubleshooting intermittent things is what separates the cats from the dogs.

And muchos thanks for all the effort you've put in to something so minor.
You're the best!

- I refer to it as BC, Before Corona, and AD, After Disaster. -
Avatar 58135
7.
 
Morning Legal Briefs
Feb 1, 2021, 14:50
7.
Morning Legal Briefs Feb 1, 2021, 14:50
Feb 1, 2021, 14:50
 
So the single quotes here look fine in the post URL here, but are corrupted on the FP.
COVID infections: 133M - - - COVID deaths: 3M - - - Death rate: 2%
Vaccines administered: 711M - - - Vaccine deaths: 7 - - - Death rate: 0.00000001%
Your choice is clear.
Avatar 22024
6.
 
Re: is the text corruption on the front page being investigated?
Feb 1, 2021, 03:11
6.
Re: is the text corruption on the front page being investigated? Feb 1, 2021, 03:11
Feb 1, 2021, 03:11
 
eRe4s3r wrote on Jan 31, 2021, 18:48:
Why is the quote window having this issue though? [...] And then sometimes it just works fine... it seems completely random even.
If I knew, I could (probably) resolve it.
-- Frans
Avatar 1258
5.
 
Re: is the text corruption on the front page being investigated?
Jan 31, 2021, 18:48
5.
Re: is the text corruption on the front page being investigated? Jan 31, 2021, 18:48
Jan 31, 2021, 18:48
 
Frans wrote on Jan 31, 2021, 03:50:
eRe4s3r wrote on Jan 30, 2021, 12:35:
Well at least I can pinpoint it to many particular symbols

’ <-- causes issues
´ ` ' " <--- no issues issues too
I'm not sure it's related to quoting characters, rather than non-UTF8 ones. But I was pretty sure I covered all the bases there when I last worked on this a few months ago.

And either way, the erratic nature of this little problem is frustrating.

eRe4s3r wrote on Jan 30, 2021, 12:35:
Testing more it seems it's Firefox doing something stupid on first load.
Blue is seeing it in his primary browser Chrome too.

Why is the quote window having this issue though? Good example of this behaving weirdly is when you use special cases, like 50m² or 50m³ (if you quote that, sometimes it completely borks it, even though it looks perfect for me, until I quote myself... really strange. And then sometimes it just works fine... it seems completely random even.
Avatar 54727
4.
 
Re: is the text corruption on the front page being investigated?
Jan 31, 2021, 03:50
4.
Re: is the text corruption on the front page being investigated? Jan 31, 2021, 03:50
Jan 31, 2021, 03:50
 
eRe4s3r wrote on Jan 30, 2021, 12:35:
Well at least I can pinpoint it to many particular symbols

’ <-- causes issues
´ ` ' " <--- no issues issues too
I'm not sure it's related to quoting characters, rather than non-UTF8 ones. But I was pretty sure I covered all the bases there when I last worked on this a few months ago.

And either way, the erratic nature of this little problem is frustrating.

eRe4s3r wrote on Jan 30, 2021, 12:35:
Testing more it seems it's Firefox doing something stupid on first load.
Blue is seeing it in his primary browser Chrome too.
-- Frans
Avatar 1258
3.
 
Re: is the text corruption on the front page being investigated?
Jan 30, 2021, 12:35
3.
Re: is the text corruption on the front page being investigated? Jan 30, 2021, 12:35
Jan 30, 2021, 12:35
 
Frans wrote on Jan 29, 2021, 03:47:
Blue and I are seeing it occasionally, and I did a lot of investigation last week. But because the effect doesn't appear consistently -- it appears/disappears when refreshing the same page, and occurs on some pages of the same .pl script but not others -- I couldn't find the cause.

Well at least I can pinpoint it to many particular symbols

’ <-- causes issues
´ ` ' " <--- no issues issues too

But only on first load, absolutely no idea why browsers do that to be honest

Testing more it seems it's Firefox doing something stupid on first load.

This comment was edited on Jan 30, 2021, 13:12.
Avatar 54727
2.
 
Re: is the text corruption on the front page being investigated?
Jan 29, 2021, 03:47
2.
Re: is the text corruption on the front page being investigated? Jan 29, 2021, 03:47
Jan 29, 2021, 03:47
 
Blue and I are seeing it occasionally, and I did a lot of investigation last week. But because the effect doesn't appear consistently -- it appears/disappears when refreshing the same page, and occurs on some pages of the same .pl script but not others -- I couldn't find the cause.
-- Frans
Avatar 1258
1.
 
is the text corruption on the front page being investigated?
Jan 28, 2021, 14:15
1.
is the text corruption on the front page being investigated? Jan 28, 2021, 14:15
Jan 28, 2021, 14:15
 
Keep seeing "’" sprinkled throughout the front page text.

edit: now that I see it is supposed be an apostrophe, I will have to explain it as I need a screen shot now. It seems randomly when an apostrophe is used, instead an accented a with 2 boxes appear instead. It can fix itself on a reload so it just happens some times when the page is rendered. I am viewing in the latest chrome for windows.
18 Replies. 1 pages. Viewing page 1.
Newer [  1  ] Older