Joe-L
Joined: 10 Jan 2009 Posts: 9
|
Posted: Tue Aug 17, 2010 12:20 pm Post subject: UTF-8 Encoding of directory dump dir.xiph.org/yp.xml broken |
|
|
Hi,
the UTF-8 encoding of the directory dump (dir.xiph.org/yp.xml) is broken.
It seems that UTF-8 is applied multiple times to encode the output.
Simply open the directory dump dir.xiph.org/yp.xml in your browser to see the effects of the over encoding.
f.i. german für becomes für
However the für is displayed correctly on the website version of the directory, so something is going wrong while creating the yp.xml.
Code: |
Example of over encoded ü
ü in utf8= c3 bc
c3 bc in UTF8 = c3 83 c2 bc
c3 83 c2 bc in UTF8 = c3 83 c2 83 c3 82 c2 bc
ü found in yp.xml = c3 83 c3 82 c2 bc
|
It is not possible to undo the over encoding by just decoding utf8 multiple times, because some control chars (like utf8 c2 83, see above) are filtered out.
I also opened a ticket at trac for this issue, see Ticket#1729.
Regards
Joe-L |
|