I think the encoding error is generated on the pickle. I'm not sure what to do. Add a comment. Active Oldest Votes. NLTK 3. Improve this answer. Worked like a charm! I use nltk 3. This is great! I had the same problem with you. I use Python 3. LuckyMatina LuckyMatina 41 6 6 bronze badges.
SubElement doc, "field" field. ElementTree root tree. Improve this question. It would be useful to see the entire error message to see where it's coming from.
In the meantime try using decode instead of encode. Updated, it successfully created my XML when I use decode , but the file is not viewable on my browser. Also, if the file myText. Additionally, you should add an encoding to tree. Might have been a non-breaking space. Just saying. Show 2 more comments. Active Oldest Votes.
Improve this answer. Add a comment. Update: If your input data is not UTF-8 encoded, then you have to. Hence, if the string data to be encoded by encode 'utf8' contains character that is outside of ASCII range e. If you are using python version earlier than version 3. This way python would be able to anticipate characters within a string that fall outside of ASCII range.
However, if you are using python version 3. Alternatively, Python-Requests returns Unicodes in response. Python 2. Again, if you get UnicodeDecodeError then you've probably got the wrong encoding. Python tries to configure an encoder on stdout so that Unicodes are encoded to the console's encoding. On Windows, you will be limited to an 8bit code page. An incorrectly configured console, such as corrupt locale, can lead to unexpected print errors.
Just like input, io. Python 3 is no more Unicode capable than Python 2. The default encoding is UTF-8, so if you. Further, open operates in text mode by default, so returns decoded str Unicode ones. It's a nasty hack there's a reason you have to use reload that will only mask problems and hinder your migration to Python 3. Understand the problem, fix the root cause and enjoy Unicode zen. See Why should we NOT use sys.
This is the classic "unicode issue". I believe that explaining this is beyond the scope of a StackOverflow answer to completely explain what is happening. It is well explained here. In very brief summary, you have passed something that is being interpreted as a string of bytes to something that needs to decode it into Unicode characters, but the default codec ascii is failing.
The presentation I pointed you to provides advice for avoiding this. Make your code a "unicode sandwich". OK - in your variable "source" you have some bytes. It is not clear from your question how they got in there - maybe you read them from a web form? In any case, they are not encoded with ascii, but python is trying to convert them to unicode assuming that they are.
You need to explicitly tell it what the encoding is. This means that you need to know what the encoding is! That is not always easy, and it depends entirely on where this string came from. You could experiment with some common encodings - for example UTF You tell unicode the encoding as a second parameter:. In some cases, when you check your default encoding print sys. If you change to UTF-8, it doesn't work, depending on the content of your variable.
I found another way:. Secondly, the above just changes the type but does not remove non ascii characters. If you want to remove non-ascii characters:. In order to resolve this on an operating system level in an Ubuntu installation check the following:. I find the best is to always convert to unicode - but this is difficult to achieve because in practice you'd have to check and convert every argument to every function and method you ever write that includes some form of string processing.
So I came up with the following approach to either guarantee unicodes or byte strings, from either input. In short, include and use the following lambdas:. Here's some more reasoning about this.
Got a same error and this solved my error. So Use python pickle's encoding argument. Link below helped me solve the similar problem when I was trying to open pickled data from my python 3. Encode converts a unicode object in to a string object. I think you are trying to encode a string object. In my case, worked for me, in Python 2.
Unicode in Python is black magic for me.
0コメント