In the past, Zip archives had no way of indicating which character encoding was used when it was originally created. Archivers resorted to using system specific encodings ( such as Windows-1251 for cyrillic languages&nsbp;). When archivers on other operating systems read these archives, the filenames would often turn out to be gibberish.
To solve this problem, modern archivers usually use the unicode standard UTF-8 encoding. They also set a flag in the archive indicating that UTF-8 has been used.
When Entropy encounters an archive that doesn't indicate its encoding, it uses a character detection algorithm. This section describes the nuances of this technique.
While we can make a really good guess about which character encoding was used, we cannot be 100% certain. This is where the confidence threshold comes in. Entropy calculates a decimal value between 0 and 1.0 that indicates how confident it is about its guess. The higher the value, the more likely that the correct encoding has been detected.
Now, you can tell Entropy to ignore the automatically detected encoding if the confidence is below a certain value, and use a fallback encoding. For most purposes, the default confidence threshold value should be fine. You should only try and tweak it if you find that Entropy is displaying nonsensical filenames.
This is the character encoding that is used if the confidence level falls below the threshold. By default, we assume UTF-8. If you know the encoding used by your archives, you can select it from the drop down list in Entropy's preferences.