Forum Webscript.Ru

Программирование => PHP => Тема начата: alesh от 23 Сентября 2002, 13:39:45

Название: utf->win
Отправлено: alesh от 23 Сентября 2002, 13:39:45: Ребят , а как из utf текст перекодировать в win?
Название: utf->win
Отправлено: AliMamed от 23 Сентября 2002, 14:12:02: я делаю так (может не самый оптимальный способ).
ко мне текст в utf8 приходит по http протоколу естественно урлинкодиный.
поэтому сначала я делаю urldecode
потом я делаю енто:

/**
* takes a string of utf-8 encoded characters and converts it to a string of unicode entities
* each unicode entitiy has the form &#nnnnn; n={0..9} and can be displayed by utf-8 supporting
* browsers
* @param $source string encoded using utf-8 [STRING]
* @return string of unicode entities [STRING]
* @access public
*/
function utf8ToUnicodeEntities ($source) {
// array used to figure what number to decrement from character order value
// according to number of characters used to map unicode to ascii by utf-8
$decrement[4] = 240;
$decrement[3] = 224;
$decrement[2] = 192;
$decrement[1] = 0;

// the number of bits to shift each charNum by
$shift[1][0] = 0;
$shift[2][0] = 6;
$shift[2][1] = 0;
$shift[3][0] = 12;
$shift[3][1] = 6;
$shift[3][2] = 0;
$shift[4][0] = 18;
$shift[4][1] = 12;
$shift[4][2] = 6;
$shift[4][3] = 0;

$pos = 0;
$len = strlen ($source);
$encodedString = \'\';
while ($pos < $len) {
$asciiPos = ord (substr ($source, $pos, 1));
if (($asciiPos >= 240) && ($asciiPos <= 255)) {
// 4 chars representing one unicode character
$thisLetter = substr ($source, $pos, 4);
$pos += 4;
}
else if (($asciiPos >= 224) && ($asciiPos <= 239)) {
// 3 chars representing one unicode character
$thisLetter = substr ($source, $pos, 3);
$pos += 3;
}
else if (($asciiPos >= 192) && ($asciiPos <= 223)) {
// 2 chars representing one unicode character
$thisLetter = substr ($source, $pos, 2);
$pos += 2;
}
else {
// 1 char (lower ascii)
$thisLetter = substr ($source, $pos, 1);
$pos += 1;
}

// process the string representing the letter to a unicode entity
$thisLen = strlen ($thisLetter);
$thisPos = 0;
$decimalCode = 0;
while ($thisPos < $thisLen) {
$thisCharOrd = ord (substr ($thisLetter, $thisPos, 1));
if ($thisPos == 0) {
$charNum = intval ($thisCharOrd - $decrement[$thisLen]);
$decimalCode += ($charNum << $shift[$thisLen][$thisPos]);
}
else {
$charNum = intval ($thisCharOrd - 128);
$decimalCode += ($charNum << $shift[$thisLen][$thisPos]);
}

$thisPos++;
}

if ($thisLen == 1)
$encodedLetter = "&#". str_pad($decimalCode, 3, "0", STR_PAD_LEFT) . \';\';
else
$encodedLetter = "&#". str_pad($decimalCode, 5, "0", STR_PAD_LEFT) . \';\';
if(strlen($encodedLetter)==6){
$encodedString .= $thisLetter;
}else{
$encodedString .= $encodedLetter;
}
}
return $encodedString;
}
Название: utf->win
Отправлено: alesh от 23 Сентября 2002, 19:58:41: гм...
не-а..

сам скрипт работает, но из utf8 в windows-1251 не переводит
Название: Рабочая функция
Отправлено: lesch от 02 Января 2003, 13:45:54: Вся проблема - в том, что встроенная функция utf8_decode переводит не в вин, а в буржуйскую кодировку, то есть, русские буквы - в пролёте. То, что здесь было выше - более универсально, вроде, но я сам не тестировал.
Предлагаю простую функцию переброса русских букв из utf8 в win и обратно. линк:
http://leschwork.web.ur.ru/lib.zip

Все инструкции - в самом файле.