Understand project statistics

Explore how to access and interpret word count statistics in both personal and corporate workspaces on Smartcat for project management.

Word count statistics is an oft-confusing topic as different CAT tools often have different approaches to calculating it. Let’s see how Smartcat handles this.

Personal workspace statistics for own projects

Word count statistics allow you to view the number of words and characters in the original text and estimate the amount of work required, taking into account repetitions and matches with the translation memory.

To calculate and view statistics, go to the project page and open the Statistics tab.

The calculation may take some time, depending on the number of documents, their volume, and the sizes of translation memories connected to the project.

You can reconfigure the set of documents or translation memories used to calculate the statistics for projects you have created. You can download the statistics in the following formats:

  • Trados XML — compatible with SDL Trados

  • Smartcat XLSX — for MS Excel or any other spreadsheet software

Clicking the Refresh button allows you to recalculate the statistics taking into account the current progress. (Technically, this will be due to the updated translation memory entries.)

In this case, all the segments that have already been translated will be counted as 102% matches in the new “snapshot” as at a certain time and for the specified documents. Such an approach allows you to remember how many words there were in the very beginning before you started to work on the project. If you forgot to calculate statistics at the start, you can uncheck the translation memory that new entries were written to and refresh the statistics.

Smartcat supports the following statistics:

  • Words

  • Asian characters

  • Pages, where 1 page = 250 words/Asian characters

  • Characters with spaces

  • Characters without spaces

For logographic languages, Smartcat indicates three columns instead of one:

  • Asian characters

  • Words in alphabetic languages

  • Words & Asian characters combined

Personal workspace statistics for assigned projects

If you have been invited to a project via the Smartcat Marketplace, you will not see the entire statistics as per the previous section. Instead, you will see the summary calculation for all your assigned tasks and workflow stages, e.g. Translation.

Here are things worth noted.

My tasks column shows a preliminary calculation of the price & volume of assigned work. You can see how many total words were assigned, the number of repetitions, and, most importantly, the number of effective words.

Effective words represent the number of words to be paid factored according to the customer’s settings of discounts for translation memory matches and repetitions.

Completed words are words you actually translated or edited. Smartcat calculates completed words in real time, so you can always see how much you have earned.

The Net rate scheme column shows the customer’s settings for discounts for repetitions and TM matches. The settings are different for different workflow stages, e.g. Translation.

The TM match rate represents how similar a segment’s text is to the one stored in the translation memory. The minimum match threshold is 50%. Matches below this threshold are considered new segments.

The final cost calculation may differ from the preliminary one, as Smartcat does not currently identify partial matches between segments that have not yet been recorded in the translation memory at the time of the preliminary cost calculation.

It is important to remember that in Smartcat, the cost calculation is always based on the number of words or Asian characters in the source text, not the translation.

Corporate workspace statistics

In basic principles of building statistics in the corporate workspace are no different from those in the personal space.

For each project, you can view the number of words or characters in documents and estimate the amount of work, taking into account repetitions and matches with the translation memory.

The statistics shows data the total number of words, segments, pages, and characters, the number of unique segments and matches with the connected TMs, the number of repetitions inside each file, and cross-file repetitions.

The corporate workspace provides more functionality and allows you to download statistics for each of the assigned contributors.

The statistics report contains the amount of work assigned to and completed by each contributor.

Statistics are downloaded in the XLSX format. A separate file is generated for each contributor. Each file contains data for a single language pair and a single task. If there are several stages or language pairs, a separate report is generated for each pair, i.e. each combination of a stage and a target language.

Statistics per assignee are generated after the assignment. Even though such statistics cannot be recalculated manually, it changes automatically depending on the assignee's progress.

You can also calculate the statistics using a different TM net rate, e.g. if you need to provide different pricing to your customer. You can set the TM net rate here.