Pages in topic:   < [1 2]
How can I count words in PDF files?
Thread poster: suesimons
Neil Coffey
Neil Coffey  Identity Verified
United Kingdom
Local time: 10:18
French to English
+ ...
PractiCount contained a trojan according to Kaspersky Sep 1, 2011

Just to warn people: I just downloaded PractiCount from the abovementioned web site and according to Kaspersky it contained a trojan.

 
Improving your PDF word count Nov 4, 2011

I am not a translator, but I do prepare documents in Adobe Indesign from which PDFs are created and then sent to our translation bureau for translation. Disagreements over word counts prompted me to seek out more reliable methods of counting words.

As others have commented, you can copy text from a PDF and then paste into Word for an accurate word count. If you are using Adobe Reader, go to the "View" drop-down menu, select the "Page display" option, an
... See more
I am not a translator, but I do prepare documents in Adobe Indesign from which PDFs are created and then sent to our translation bureau for translation. Disagreements over word counts prompted me to seek out more reliable methods of counting words.

As others have commented, you can copy text from a PDF and then paste into Word for an accurate word count. If you are using Adobe Reader, go to the "View" drop-down menu, select the "Page display" option, and then the "Enable scrolling" option from that drop-down menu (you do this to ensure that when you do your copying that the entire PDF is selected).

Hit CNTL+A, then CNTL+C (in Windows), to select the entire PDF and then copy it. Open Microsoft Word and hit CNTL+V to paste. In my experience, any text (including text in images) should get picked up and pasted. Scroll through the document to check on any anomalies. The letter combination of "fi" or "fl" often causes words to inappropriately split in two (this is a known bug that results from the concept of ligatures). A search and replace sequence will take care of this.

If you have Adobe Acrobat Pro (as I do), you have an added tool called the Redaction Tool which will enable you to block out text that you do not want counted This tool was intended to blacken out confidential information on a document, but in doing so, also deletes those text blocks from the word count. This is important if you want to exclude things like numbers in tables, repetitive headers/footers, repetitive table row stubs/column headers, etc.

To use the tool, open the drop down menu "Advanced", then select "Redaction" and you will see a number of tools under that menu (you can drag this toolbar to your screen). Use the "Mark for redactions" button to go through and draw red rectangles around text that you wanted redacted (Note: make sure you have crosshairs showing on things like tables or images, so that you are able to select and block the text). You can create boxes anywhere, so if you have text that wraps around objects, just use extra boxes to mark different sections of the text.

When you've finished "marking for redaction", hit the "Apply Redactions" button and everything in the boxes will turn black.

If you now do a copy and paste into Word, all the text in the black boxes will not be there and will not be counted.

Adobe Acrobat Pro is expensive (I need it for many other purposes), but it does save time and money when sending PDF documents to translation.
Collapse


 
Tony M
Tony M
France
Local time: 11:18
Member
French to English
+ ...
SITE LOCALIZER
A different experience Nov 4, 2011

I had a problem with this recently, where a client who had failed to agree in advance the wordcount for a PDF job complained at the invoicing stage that I had charged for more words than there should have been.

I tried 3 different ways of counting the words:

1) 'Select all' text and copy to Word

2) OCR using Abbyy Finereader and output as 'editable text'

3) OCR using Abbyy Finereader and output as RTF

On a
... See more
I had a problem with this recently, where a client who had failed to agree in advance the wordcount for a PDF job complained at the invoicing stage that I had charged for more words than there should have been.

I tried 3 different ways of counting the words:

1) 'Select all' text and copy to Word

2) OCR using Abbyy Finereader and output as 'editable text'

3) OCR using Abbyy Finereader and output as RTF

On a document of around 3000 words, the spread of results was nigh on 20%, and this could not be accounted for just by the relatively small number of words included within images — which were counted in the OCR versions but not, of course, in the direct copy.

In the end, we split the difference and agreed on a compromise figure — but I have warned this customer that next time they send me a PDF file, they must either furnish the extracted document with it, or accept my wordcount.
Collapse


 
Catherine Muir
Catherine Muir  Identity Verified
Australia
Local time: 19:18
Indonesian to English
+ ...
In memoriam
Use ABBYY FineReader to count words in image-based PDF Aug 27, 2012

I just received a request for a quote to translate 6 PDFs, a total of 67 pages. The quality of the images was poor. Nonetheless, using ABBYY FineReader 11, in just a few minutes I was able to produce a ROUGH .docx file and then run the word count from within MS Word, resulting in a ROUGH count of 16,175 words.

I transmitted the file to the agency that requested the quote, with my basic per source word quote, including an element for conversion of the image-based PDF to a usable Word
... See more
I just received a request for a quote to translate 6 PDFs, a total of 67 pages. The quality of the images was poor. Nonetheless, using ABBYY FineReader 11, in just a few minutes I was able to produce a ROUGH .docx file and then run the word count from within MS Word, resulting in a ROUGH count of 16,175 words.

I transmitted the file to the agency that requested the quote, with my basic per source word quote, including an element for conversion of the image-based PDF to a usable Word document, along with my estimated turnaround time and payment terms, and suggested they come back to me to negotiate if interested.

If I get the job, I will do a proper conversion and validation in FineReader 11, create a .doc file, run CodeZapper on it and then import it into my DVX2 Pro TenT program for translation. (I use .doc files because DVX2 Pro seems to work better than with .docx files.)

Many agencies have no concept whatsoever of what it takes to convert a PDF made from a bad photocopy of a document into a professional translation. It takes a lot of time, not like putting a dime in a jukebox and out pops a song!

For those who might be considering using an online program to provide a word count, be careful: confidentiality might be the victim.

Cheers and best wishes for the upcoming change of season!
Catherine Muir
Freelance Translator, Indonesian>English
Mildura VIC Australia
Collapse


 
Catherine Muir
Catherine Muir  Identity Verified
Australia
Local time: 19:18
Indonesian to English
+ ...
In memoriam
word count feature in DVX2 Pro Jan 8, 2013

Another way to do it, for those translators using DVX2 Pro, is built into the program. Of course, you have to have a good, clean file to start with, either a verified conversion of a PDF (I use ABBYY FineReader 11 for this stage) or a DOC/DOCX that has been corrected to remove extra spaces, etc. DVX2 Pro will produce a detailed report in either an MS Word-like count or a DVX count (so far I haven't found them to differ), which can be saved as an RTF and provided to a client to verify your word c... See more
Another way to do it, for those translators using DVX2 Pro, is built into the program. Of course, you have to have a good, clean file to start with, either a verified conversion of a PDF (I use ABBYY FineReader 11 for this stage) or a DOC/DOCX that has been corrected to remove extra spaces, etc. DVX2 Pro will produce a detailed report in either an MS Word-like count or a DVX count (so far I haven't found them to differ), which can be saved as an RTF and provided to a client to verify your word count.

FWIW, I downloaded a trial version of Adobe Acrobat XI, which supposedly had a more direct route to go from an image-based PDF (such as a crappy photocopy that had been scanned and emailed) to a Word document, but I found it no better than my FineReader 11, so I didn't buy it when the trial period ended. (You'd think the creator of the PDF would be able to provide good software to unravel a PDF, wouldn't you???)

Also FWIW, the word count I get in DVX2 Pro has proven to be less that what the client has estimated, so they're happy. As the Indonesian saying goes, "Asal Bapak senang." (Meaning, "As long as the boss is happy...")

Cheers,
Catherine Muir
Collapse


 
Pages in topic:   < [1 2]


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

How can I count words in PDF files?






CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

Buy now! »
Trados Business Manager Lite
Create customer quotes and invoices from within Trados Studio

Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.

More info »