Encoding Tests
CSV test files in UTF-8 (no BOM), UTF-8 with BOM, Shift_JIS, and CP932. For mojibake debugging and CSV import testing. Includes Japanese rows.
UTF-8(BOMなし)のCSVテストファイル
utf8.csv / 659 B
UTF-8 BOM付きのCSVテストファイル
utf8-bom.csv / 662 B
Shift_JISのCSVテストファイル
sjis.csv / 514 B
CP932(機種依存文字含む)のCSVテストファイル
cp932.csv / 518 B
Why character encoding tests matter
CSV files containing Japanese (or other non-ASCII) text often suffer mojibake due to encoding mismatches. Excel, for example, prefers UTF-8 with BOM, and tool support varies widely.
Use these test files to verify that your CSV importers and text-processing libraries handle each encoding correctly.
Key characteristics of common encodings
- UTF-8: The most common; default in most programming languages.
- UTF-8 BOM: Recommended for opening Japanese CSV in Excel. Prepends three bytes (EF BB BF).
- Shift_JIS: Widely used on Windows; some characters (like 〜, −) cause issues.
- CP932: Extended Shift_JIS that supports machine-dependent characters like 髙, 﨑.