Use flexible XML parsing methods

Understand how to use Smartcats flexible XML parser to set models and customize parsing methods for XML files, including checking settings and setting paths to desired elements.

It's common knowledge that sometimes preset XML parsers do not allow accomplishing the desired outcome like uploading only specific strings and elements for translation or adding IDs and length limits for segments in the CAT Editor. With the Smartcat flexible XML parser, you can set models for each case you have to deal with.

The first thing to do is to check the Settings section in your account then go further to File Formats where you will find the settings.

Secondly, you should set models here to be able to apply them for documents you are planning to upload. Of course, you may set several models that match requirements for this or that project.

Before rushing to set new XML models, it's better to check the XPath syntax since the system will be looking for paths to elements in XML files according to it. Another thing to employ is a test environment that will indeed help to try out whether a model works or need to be adjusted. Be sure that Xpather will do the job.

Setting a custom XML parsing method

The form for setting a model has a few fields, some of them are optional while two are key and therefore mandatory. Let's begin with them.

Import setting name: The name of a future parsing method.

Segment-forming elements: This field is used for setting a path to XML elements you want to put in the Editor. You may set several more than one element that the system looks through a whole document.

For example, the path //test-string or /test/test-string display the string Translate me from the XML file below:

<test>
<test-string> Translate me </test-string>
</test>

It's not possible to save a model without filling in, at least these two fields.

Let's move on to the optional fields and checkbox. Besides the core fields, you may set the paths to translatable and untranslatable nodes, string IDs, comments, and length limits. Another available setting is whether you want to protect HTML tags in CDATA sections, which we will cover later.

Translatable text: Segment-forming elements comprise nodes — other elements, their attributes, or text that can be translated. That being said, the setting is relative to segment-forming elements and are searched for within only the elements set there. More than one path could be set as well.

Example:

The XML file has an element test, an attribute attribute-1 , and another element test-string.

<test>
<test-string attribute-1= "Well"> Translate me </test-string>
</test>

Two segments will appear in the Editor: Translate me and Well.

Untranslatable inline elements: It's optional to set a path to an element that you don't want to translate; thus, the element will not get to the Editor. The element's content will be marked as a tag in the Editor. More than one value can be set in this field too.

Basically, the setting makes it possible to not show technical pieces or else that you don't want to translate.

Here is a simple example:

<test>
<test-string>Very important <tag>no</tag> content</test-string>
</test>

String ID: The field allows selecting a unique node that will be used as a segment ID in the Editor. Even though the setting is optional, it's a powerful tool that lets you update a document keeping the segment revisions as well as assignments. Only one node can be specified.

Important: The field value has to be unique for each segment within the file.

In the example below, the path to the attribute ID of the element test-string is used as unique identificator in Smartcat.

<test>
<test-string id= "1">Test</test-string>
</test>

As you may see, the segment has the ID placed at the bottom.

Comments: The field, which you may use for adding comments that will be shown in the Segment comments section in the Editor. It's possible to set more than one comment.

Example:

<test>
<test-string com= "1 comment">Text<comment>2 comment</comment></test-string>
</test>

Segment length limit: Here it's possible to set the path to a node, which should be an integer of the string type. The integer will indicate how many characters are possible to write in a segment. Only one length limit can be set.

In the following example we are going to use these settings:

Segment-forming element: //test-string
Segment length limit: @max

<test>
<test-string max = "10">My text</test-string>
</test>

A linguist who is working on the segment cannot confirm it if the length of translation exceeds the limit.

Protect HTML tags in CDATA sections: Depending on whether the box marked or not, HTML tags will be shown as an editable piece of text of substituted by Smartcat tags. Without protection, tags and the text within will look like this text within will look like this <br>text</br>. If tags are protected, Smartcat tags substitute HTML ones .

It is worth mentioning that for the successful use of the custom XML parser in Smartcat, files must contain repeating elements, each of which, in turn, will be turned into a source segment.