How to Fix Character Encoding Issues
Character encoding problems can turn perfectly good text into unreadable gibberish. Whether you're dealing with garbled Chinese characters in Excel, broken Japanese text on a website, or mysterious symbols in your database, this guide will help you fix it.
Understanding the Problem
Character encoding issues occur when text is saved in one encoding but read in another. It's like writing a letter in Spanish but someone tries to read it assuming it's French—the words don't make sense.
Common Symptoms:
- Chinese characters show as:
ÄãºÃor锟斤拷 - Smart quotes appear as:
’or“ - Japanese text shows as:
日本語 - Question marks or boxes:
???or□□□ - Random accented letters where they shouldn't be
Fix #1: Text Files and Documents
1Open in a Text Editor with Encoding Support
Use editors like Notepad++, VS Code, or Sublime Text that let you change encoding.
2Try Different Encodings
In the editor menu, look for "Encoding" or "Character Set" and try:
- UTF-8 (try first—most common)
- GBK or GB18030 (for Chinese)
- Big5 (for Traditional Chinese)
- Shift-JIS (for Japanese)
- EUC-KR (for Korean)
- Windows-1252 (Western European)
3Save as UTF-8
Once you find the encoding that displays text correctly, immediately save the file as UTF-8 to prevent future issues.
Fix #2: Excel and CSV Files
Excel often causes encoding issues with Chinese, Japanese, or special characters.
Opening a Garbled CSV in Excel:
1Don't Double-Click the CSV
Opening directly can cause encoding issues.
2Import via Data Tab
- Open Excel
- Go to Data → Get Data → From File → From Text/CSV
- Select your file
- In the import dialog, change File Origin to:
65001: Unicode (UTF-8)for UTF-8 files936: Chinese Simplified (GB2312)for GBK files950: Chinese Traditional (Big5)for Big5 files
- Click Load
Saving Excel to UTF-8 CSV:
File → Save As → CSV UTF-8 (Comma delimited) (*.csv)
Fix #3: Database Encoding Issues
MySQL / MariaDB
Check current encoding:
SHOW VARIABLES LIKE 'character_set%';
SHOW VARIABLES LIKE 'collation%';
Set database to UTF-8:
ALTER DATABASE your_database CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
ALTER TABLE your_table CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
In your connection string:
mysqli_set_charset($conn, "utf8mb4");
-- or in PDO:
$pdo = new PDO('mysql:host=localhost;dbname=test;charset=utf8mb4', $user, $pass);
utf8mb4 (not just utf8) in MySQL.
The utf8mb4 encoding supports full Unicode including emojis, while utf8 is incomplete.
PostgreSQL
-- Check encoding
SHOW SERVER_ENCODING;
-- Create database with UTF-8
CREATE DATABASE mydb ENCODING 'UTF8';
Fix #4: Website Encoding Issues
HTML Files
Add this in the <head> section:
<meta charset="UTF-8">
<head>.
HTTP Headers
Apache (.htaccess):
AddDefaultCharset UTF-8
Nginx:
charset utf-8;
PHP:
header('Content-Type: text/html; charset=utf-8');
Fix #5: Email Encoding Problems
Emails with garbled subjects or body text usually have encoding issues in MIME headers.
Fixing Email Subject Lines:
Email subjects should be encoded using RFC 2047 format:
=?UTF-8?B?5L2g5aW977yB?=
// Decodes to: 你好!
If you see raw encoded text like this in your email subject, your email client isn't decoding it properly. Try a different email client or use our decoder tool.
PHP Mail Example:
$subject = "=?UTF-8?B?" . base64_encode($subject) . "?=";
$headers = "Content-Type: text/html; charset=UTF-8\r\n";
mail($to, $subject, $message, $headers);
Fix #6: Programming Language Specifics
Python
# Reading files
with open('file.txt', 'r', encoding='utf-8') as f:
content = f.read()
# Writing files
with open('output.txt', 'w', encoding='utf-8') as f:
f.write(text)
JavaScript (Node.js)
const fs = require('fs');
// Reading
const text = fs.readFileSync('file.txt', 'utf8');
// Writing
fs.writeFileSync('output.txt', text, 'utf8');
Java
// Reading
BufferedReader reader = new BufferedReader(
new InputStreamReader(new FileInputStream("file.txt"), StandardCharsets.UTF_8)
);
// Writing
BufferedWriter writer = new BufferedWriter(
new OutputStreamWriter(new FileOutputStream("output.txt"), StandardCharsets.UTF_8)
);
Prevention Tips
✅ Best Practices:
- Always use UTF-8 for new projects, files, and databases
- Declare encoding explicitly in HTML, HTTP headers, and database connections
- Test with international characters before going live
- Use modern tools that default to UTF-8
- Validate data entry to ensure proper encoding from the start
❌ Avoid:
- Using Notepad (Windows) for non-English text—it often saves in wrong encoding
- Assuming ASCII is enough—it only supports English
- Copy-pasting between systems without checking encoding
- Using
charset=ISO-8859-1or old encodings for new projects
Quick Diagnosis Flowchart
Q: Do you see ’ or “?
→ UTF-8 text displayed as Windows-1252. Fix: Decode as UTF-8.
Q: Do you see Chinese as ÄãºÃ?
→ GBK text displayed as Latin-1. Fix: Decode as GBK.
Q: Do you see ??? or □□□?
→ Original bytes lost during save. Cannot be recovered. Prevention only.
Still Having Issues?
Use our free automatic fixer to repair garbled text instantly.
Fix Garbled Text Now →