Drupal: Chinese Characters in URL
Recently, we ran into a problem with Drupal when displaying Chinese characters or any non English characters (UTF-8) in the URL. All the URL returned with a 400 Bad Request error along with a 403 Forbidden Error.
e.g.
Bad Request
Your browser sent a request that this server could not understand.
Additionally, a 403 Forbidden error was encountered while trying to use an ErrorDocument to handle the request.
There were not much information on the web regarding this issue. We have also contacted our hosting company to see if they have seen this problem before. Maybe they have a solution for us. They weren’t that helpful. Only suggesting us to get our own servers. Unfortunately, this is a shared server and we couldn’t modify any settings on the server to troubleshoot this more.
Playing around with Drupal a little bit more, we have noticed that inside Drupal it has a ‘Clean URL’ option which turns the default Drupal URL into a cleaner version. It looks like Apache is treating any URLs with special characters in them as a file when the Clean URL is on. This causes the 403 Forbidden Error. With the Clean URL off, we were able to get all our links, even the ones with Chinese characters, to work. Although the URL isn’t as nice as with the Clean URL feature on, at least the site is working correctly and users are able to browse and search with non-English characters in them.
Do you know if using the recommend collation for the druapl database tables of “utf8_general_ci” will support Chinese Characters for the general user input/output? I noticed that the mySQL docs (http://imysql.cn/docs/MySQL_51_en/ch10s09.html) suggesting using “big5_chinese_ci” collation. Could this be related to your cleanURL issue?
D.
Yes, using ‘utf8_general_ci’ will support Chinese characters. One thing to watch out is that you will need to make sure to use the ‘SET NAMES’ command before your sql query statements and set it to utf8. Otherwise, you won’t be able to query any utf8 characters in your queries.
e.g.
mysql_query(“SET NAMES utf8″);
No, we haven’t test using ‘big5_chinese_ci’.