Weborum Webmaster Forum > Retrieving unicode chars from MySQL
Help - Search - Members - Calendar
Full Version: Retrieving unicode chars from MySQL
Weborum Webmaster Forum > TUTORIAL ARCHIVE - tutorials & scripts to save you scouring the internet. Please feel free to add your own. > PHP Tutorials & scripts
pdc
Any help with what I suspect will prove to be a straighforward problem, would be much appreciated.

I am trying to display Japanese characters from a MySQL db. For experimentation I have boiled this down to a database with a single table containing two columns. One contains Japanese kanji and the other the equivalent unicode value as a character string. The default character set for the db is UTF-8. A query run with phpMyAdmin returns the correct data. However, I have been unable to achieve this with my own scripts (or to work out how phpMyAdmin is doing this). Instead the Japanese characters (which are all three bytes long) are returned as single byte question marks huh.gif .

An abbreviated version of my script may be seen below.

// Connecting, selecting database
$connect = mysql_connect('localhost', "user1", "testing")
or die('Could not connect: ' . mysql_error());

$db = 'kanjidb';
mysql_select_db($db) or die('Could not select database ('.$db.') because of : '.mysql_error());

// Performing SQL query
$query = 'SELECT kanji, unicode FROM kanjitbl';
$result = mysql_query($query) or die('Query failed: ' . mysql_error());

// Printing results in HTML
echo "<table border='1' width='300'>\n";
while ($line = mysql_fetch_array($result, MYSQL_ASSOC)) {

echo "\t<tr>\n";
foreach ($line as $col_value) {

echo "\t\t<td>$col_value</td>\n";
}

echo "\t</tr>\n";
}

echo "</table>\n";


The script also includes the following : <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> and the resulting pages encoding is confirmed as UTF-8.

I have amended the [mb_string] section of my php.ini file as follows (basically throwing everything at this problem)

mbstring.language = Japanese
mbstring.internal_encoding = UTF-8
mbstring.http_input = UTF-8
mbstring.http_output = UTF-8
mbstring.encoding_translation = on
mbstring.detect_order = UTF-8, ASCII, EUC-JP, JIS, SJIS
mbstring.substitute_character = long;
mbstring.func_overload = 7


When mbstring.detect_order is set to auto, it tells me that each item returned from the query is in ASCII format. Now it says that it's om UTF-8 format, but doesn't seem to treat it as such thumbsdownsmileyanim.gif .

Any help with spotting what I have missed, or even completely different strategies for retreiving and displaying multibyte unicode characters would be very much appreciated thumbsupsmileyanim.gif .
magicite
The funny thing about your situation is that I'm doing the exact same thing (Kanji + unicode code) and have the exact same problem, except I'm using perl.

My phpMyAdmin installation properly shows the kanji in all of its glory.

I think we can safely assume that it's some MySQL server variable that needs to be changed (what it is, I do not know). phpMyAdmin probably sets a connection variable every time, hence why it can display things properly.

Now it's time to figure out what it is! biggrin.gif

I'm still creeped out that we have the exact same problem . . . (and that I can't find an answer anywhere).
magicite
Yay! I figured out a solution. This is perl code, but I'm sure you can figure out what I'm doing.

CODE

   # let's go over to utf8 land
   my $sth = $dbh->prepare(qq{
    SET CHARACTER SET utf8
   });
   $sth->execute();
   $sth->finish();


Basically, it's referring to this in the MySQL manual:

CODE

CHARACTER SET {charset_name | DEFAULT}

This maps all strings from and to the client with the given mapping. Before MySQL 4.1, the only allowable value for charset_name is cp1251_koi8, but you can add new mappings by editing the sql/convert.cc file in the MySQL source distribution. As of MySQL 4.1.1, SET CHARACTER SET sets three session system variables: character_set_client and character_set_results are set to the given character set, and character_set_connection to the value of character_set_database.

The default mapping can be restored by using a value of DEFAULT.

Note that the syntax for SET CHARACTER SET differs from that for setting most other options.


These variables can be permanently set in the my.cnf file (MySQL's preset stuff). Feel free to message me if you have any more questions.

For those who use Perl and eventually find this message, once you get the data from the database (DBI, in this case), you need to decode it properly.

So let's say my variable $kanjiKunyomi contains UTF data. If I wanted to print it, I'd have to do this:
CODE

print Encode::decode_utf8($kanjiKunyomi);


Hope that helps!
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2010 Invision Power Services, Inc.