eRe4s3r wrote on Feb 10, 2021, 22:28:Since the style sheet contains no non-ASCII characters, that doesn't have any effect, but it doesn't hurt either.
Did you declare @charset "utf-8"; in the primary CSS and *SAVE* that .css file encoded in UTF-8 (with notepad++ or some other file editor that allows specifying..) ?
eRe4s3r wrote on Feb 10, 2021, 22:28:Hit Ctrl-U and look at line 12: <link href="/css/styles.css"...> Click it to view its contents.
I dunno which CSS is loaded for NewBlue,
fds wrote on Feb 10, 2021, 08:41:5.7, and yes I'm aware of the mb4 charset situation, but was hoping to avoid that. And the eacute is 2-bytes so I'd think the mb4 angle wouldn't explain the problem.
Which version of MySQL is it? Older versions of MySQL, and all versions of MariaDB, required the use of their specially named "utf8mb4" character set instead for true utf-8, one that also supports 4-byte characters such as all emojis.
fds wrote on Feb 10, 2021, 08:41:How did you determine the received byte(s), packet sniffing?
Currently for me, your #2 link consistently serves the title of the story as Cyberpunk 2077 Expos0xE9
, which is invalid utf-8, and as such, consistently shows broken. Both in the h2 posting title, and the top-level title tag which then ends up as the tab's title.
fds wrote on Feb 10, 2021, 08:41:No, the only truly static page is the old HTML archive index (and 3 more that aren't relevant here). And as noted, the frontpage is a static .html file served via the switch script.
However, the issue is only on this /s/ "Share" link, which I gather was statically generated one time.
fds wrote on Feb 10, 2021, 08:41:For title element and story header yes, but most comment subjects show ?'s for me. Although those come out of a separate table/column, and were copied from the news story header when that was still an incorrect 1-byte Latin1 version of eacute, thus the ?'s are to be expected.
If I click over to the comment viewer board.pl version, it is always correct. It's served as a proper Cyberpunk 2077 Expos0xC3A9
…
fds wrote on Feb 10, 2021, 08:41:No, that's not it then.
If I had to guess, not all statically generated pages got refreshed after you have fixed whatever was borked in the perl settings earlier, and some corrupted static pages remain.
Frans wrote on Feb 6, 2021, 05:29:Wishful thinking... the replacement mark returned for me too (some days ago but I didn't have time for this until now).
I was going to add one more weird observation about the replacement characters, but since last night I'm not getting them anymore during dozens of refreshes in multiple browsers. So perhaps the issue resolved itself just as mysteriously as it popped up...![]()
Frans wrote on Feb 5, 2021, 14:50:Which version of MySQL is it? Older versions of MySQL, and all versions of MariaDB, required the use of their specially named "utf8mb4" character set instead for true utf-8, one that also supports 4-byte characters such as all emojis.
- it gets stored in MySQL, which along with its tables and string columns are set to UTF-8 (utf8_unicode_ci)
Frans wrote on Feb 5, 2021, 14:50:
Separately, we still have a problem with UTF-8 characters in single stories. In fact, the two Cyberpunk Exposé stories showing up in the popular threads box mid-January is how I first noticed it. And one of them is still erratic on the Share/Comments links:
0xE9
, which is invalid utf-8, and as such, consistently shows broken. Both in the h2 posting title, and the top-level title tag which then ends up as the tab's title.0xC3A9
…Frans wrote on Feb 6, 2021, 05:29:Frans wrote on Feb 5, 2021, 14:50:I was going to add one more weird observation about the replacement characters, but since last night I'm not getting them anymore during dozens of refreshes in multiple browsers. So perhaps the issue resolved itself just as mysteriously as it popped up...
But I don't know the reason nor remedy for this yet; it's really weird that the browser only renders UTF-8 characters correctly and consistently if they're 3-byte ones (and then the 2-byte ones too), but not if a page includes only 2-byte characters, given that HTML and HTTPS instruct it to use that character set.
Yes, a workaround remedy may be to insure that the page always includes a 3-byte character somewhere, but I'd still like to understand the underlying problem if possible.![]()
Frans wrote on Feb 5, 2021, 14:50:I was going to add one more weird observation about the replacement characters, but since last night I'm not getting them anymore during dozens of refreshes in multiple browsers. So perhaps the issue resolved itself just as mysteriously as it popped up...
But I don't know the reason nor remedy for this yet; it's really weird that the browser only renders UTF-8 characters correctly and consistently if they're 3-byte ones (and then the 2-byte ones too), but not if a page includes only 2-byte characters, given that HTML and HTTPS instruct it to use that character set.
Yes, a workaround remedy may be to insure that the page always includes a 3-byte character somewhere, but I'd still like to understand the underlying problem if possible.
Kxmode wrote on Feb 2, 2021, 17:25:Blue wrote on Feb 2, 2021, 17:19:The Half Elf wrote on Feb 2, 2021, 16:31:
Ok so having an issue that the website's text are garbelled, so using a chrome extenstion I have to set the site's encoding to UTF-8 (on the main page). Anyone else experiencing this or is it on my end?
This has popped up intermittently for a while now. Frans has put a lot of effort into tracking it down, but an answer continues to elude us. It's maddeningly inconsistent. As noted, it will randomly go away and/or return if you hit refresh, so it sort of defies logic what's going on.![]()
I've seen this before. It looks like the quotes and hyphens in the database are Word formatted. The browser does its best to convert them into ASCII but may occasionally get stuck.
Frans wrote on Jan 31, 2021, 03:50:eRe4s3r wrote on Jan 30, 2021, 12:35:I'm not sure it's related to quoting characters, rather than non-UTF8 ones. But I was pretty sure I covered all the bases there when I last worked on this a few months ago.
Well at least I can pinpoint it to many particular symbols
’ <-- causes issues
´ ` ' " <---no issuesissues too
And either way, the erratic nature of this little problem is frustrating.eRe4s3r wrote on Jan 30, 2021, 12:35:Blue is seeing it in his primary browser Chrome too.
Testing more it seems it's Firefox doing something stupid on first load.
eRe4s3r wrote on Jan 30, 2021, 12:35:I'm not sure it's related to quoting characters, rather than non-UTF8 ones. But I was pretty sure I covered all the bases there when I last worked on this a few months ago.
Well at least I can pinpoint it to many particular symbols
’ <-- causes issues
´ ` ' " <---no issuesissues too
eRe4s3r wrote on Jan 30, 2021, 12:35:Blue is seeing it in his primary browser Chrome too.
Testing more it seems it's Firefox doing something stupid on first load.
Frans wrote on Jan 29, 2021, 03:47:
Blue and I are seeing it occasionally, and I did a lot of investigation last week. But because the effect doesn't appear consistently -- it appears/disappears when refreshing the same page, and occurs on some pages of the same .pl script but not others -- I couldn't find the cause.![]()