This article describes best practices to help you get high-quality translations from PDF files.
Best Practices for Digital PDF and Scanned PDF Translation
Pairaphrase offers robust scanned and digital PDF support that simplifies the translation of documents while preserving their original formatting. For most PDFs, including those with complex layouts, images, and tables, Pairaphrase's advanced technology can accurately extract and translate content, reducing the need for manual reformatting. However, translating PDFs sometimes requires experimentation on the user’s part to obtain the best results, as not all scanned PDFs can be translated, and certain digital PDFs, such as those created in InDesign, are better translated using their native formats. Pairaphrase delivers optimal accuracy and formatting consistency, allowing you to achieve professional results with minimal effort while navigating the unique challenges of PDF translation. This article primarily focuses on scanned PDF files.
To get a high-quality translation you must get an accurate OCR
Scanned PDF files pose unique translation challenges. Often times, scanned PDF files are of poor legibility with obstacles (stamps, handwritten text, signatures, watermarks) that prevent an accurate OCR. Without a high-quality OCR Pairaphrase cannot produce an accurate translation. To get the best translation results from Pairaphrase you must remove any impediments that sit on top of text. Use a PDF editor like Adobe Acrobat Pro to edit your scanned PDF files before translation.
Stamps
Remove any and all text impacted by a stamp that sits on top of text. Remove the entire surrounding text and the affected text covered by the stamp.
Signatures
Signatures that reside on top of text will degrade translation quality for all nearby text. You’ll need to remove the entire body of text including the surrounding text.
Watermarks
In most cases a watermark will not impact your translation quality. However, if the watermark is almost the same darkness as the text, it may interfere with the OCR quality. Remove complete sentences and any nearby text impacted by the watermark.
Handwritten Text
Remove any handwritten text and any initials even if they are not covering any readable text. Handwritten text will likely OCR incorrectly and interfere with the formatting of the page and cause poor translation results.
Besides stamps, signatures, watermarks and handwritten text; blurry, low-contrast, or skewed scans, can hinder text recognition as well. Most translation issues are due to poor quality scans that cannot be effectively recognized by Optical Character Recognition (OCR).
Need help? Contact us via online chat for the fastest customer support.