HTML import preprocessing with Javascript

ReqEdit Online Documentation

Back to Table of contents Community

HTML import preprocessing with Javascript

4 min read

Javascript API v.1 User javascript Testing Running in ReqEdit Tips

HTML import preprocessing with Javascript

4 min read
last updated: 03/27/2024

The Javascript engine enables users to do preprocessing on the html content directly before importing to ReqIF. If no userscript is not selected the pre-processing part is skipped.

The javascript engine is https://github.com/paulbartrum/jurassic – a standalone JS engine with no other dependency, therefore only a reduced set of features are available from the browser world.

Javascript API v.1

function	description
removeElementByID(id)	id attribute to match and delete
removeElements(nodes)	remove the nodes from the dom object
getElementsBy(node, tag, css, id)	get back a list of html objects from the specified node optionally filtered by tag name (cs), css class name (ci) or id attribute (ci)
getFirstNodesBy(node, tag)	get back the first child node under the specified node that is the specified tag name
trimAll(val)	replace all starting and trailing whitespace characters and all newlines with spaces in val
hasClass(node, val)	detect if the specified node contains the specified class name (ci) by val
hasID(node, val)	detect if the specified node has the specified ID name (ci) by val

cs – case sensitive, ci – case insensitive

User javascript

Each user javascript receives the htmlstring containing the html raw content and must return a htmlstring value by calling setReturnValue(htmlstring). A doc object is available that is basic DOM model of the hierarchical nodes of the htmlstring.

doc – the DOM model based on – https://github.com/xmldom/xmldom (v0.8.6)

Large HTML splitting

Javascript runner has a limitation processing large HTML source files, the content has to be splitted. This is done with the following annotations (in the comment, that’s right)

contentSelector and excludeSelector – uses https://github.com/hcesar/HtmlAgilityPack.CssSelector engine

variable	description
contentSelector	split the html file into multiple elements, each element will be processed separately
excludeSelector	exclude elements from the list based on the specified selector – currently in development

javascript global variables

Example script

/*
contentSelector = "body > *"
excludeSelector = ".footnote, .notes"
*/


// PERFORM CUSTOM ACTIONS

var tables = getElementsBy(doc, "table", "table"); // tables with .table class
var images = getElementsBy(doc, "img");

//console.log("table" + tables.length + " images" + images.length)
// node can be .table or can contain images - leave alone
// node contains super/sub formatted strings replace tags

if (!tables.length) {
    if (!images.length) {
        // if structural table
        if (doc.firstChild.nodeName.toLowerCase() == "table") {
            var line = "<p class='object'>";

            // process first row
            var trs = getFirstNodesBy(doc, 'tr');
            for (var i = 0; i < trs[0].childNodes.length; i++) {
                var td = trs[0].childNodes[i];
                //console.log(td.nodeName + " " + trimAll(td.textContent));
                if (trimAll(td.textContent)) {
                    // first p of TD
                    // var firstp = getFirstNodesBy(td, 'p');
                    // line += trimAll(firstp[0].textContent)+" ";
                    //console.log(trimAll(td.textContent));

                    for (var ii = 0; ii < td.childNodes.length; ii++) {
                        if (td.childNodes[ii].nodeName.toLowerCase() != "table") {
                            line += trimAll(td.childNodes[ii].textContent) + " ";
                        }
                    }

                    var subtables = getFirstNodesBy(td, 'table');
                    if (subtables.length) {
                        line += "<br>";
                        var l2tables = getFirstNodesBy(td, "table");
                        l2tables.forEach(function (table) {
                            line += trimAll(table.textContent) + "<br>";
                            //console.log(trimAll(table.textContent));
                        })
                    }
                }
            }
            line += "</p>";
            //console.log("line");
            htmlstring = line;
            /*
              htmlstring = htmlstring.replace(/<span class="super">(.+?)<\/span>/ig, "<sup>$1</sup>");
              htmlstring = htmlstring.replace(/<span class="sub">(.+?)<\/span>/ig, "<sub>$1</sub>");
              htmlstring = "<p>" + doc.firstChild.textContent + "</p>";
              */
        } else
            if (doc.firstChild.nodeName.toLowerCase() == "div") {
                // annex that has to be processed again
                //console.log('div annex');
                htmlstring = doc.toString();
            }
            else
            {
                var elem = doc.firstChild;
                // is not a table

                // starts with number
                if (/\d+/i.test(elem.textContent)) {
                    htmlstring = "<p class='object'>" + elem.textContent + "</p>";
                    //console.log("p ");
                } else
                    if (hasClass(elem, 'ti-art') || hasID(elem)) {
                        htmlstring = "<h1 class='object heading'>" + elem.textContent + "</h1>";
                        //console.log("h1 ");
                    } else
                        if (hasClass(elem,'.sti-art')) {
                            htmlstring = "<h2 class='object heading'>" + doc.firstChild.textContent + "</h2>";
                            //console.log("h2 ");
                        } else {
                            htmlstring = "<small>" + doc.firstChild.textContent + "</small>";
                            //console.log("small ");
                        }
            }
            //htmlstring = doc.toString();
    } else {
        // has images
        // do not change
        //console.log("has image");
        htmlstring = doc.toString();
    }
} else {
    // content is other then tables
    // do not change
    //console.log("is table");
    htmlstring = doc.toString();
}

//htmlstring = doc.toString();
setReturnValue(htmlstring);

Testing

ReqEdit will execute whatever the userscript contains. Any errors are logged back to the user.

One way of testing is to use JSBIN https://jsbin.com/?js,console – online js sandbox, where the userscript can be tested with html parts. The contentSelector and excludeSelector will not work.

Always use the actual scripts available in ReqEdit.

Take the ReqEdit/Data/JavascriptEngine/dom-parser.js and
ReqEdit/Data/JavascriptEngine/jsapi.js and paste it in the Javascript panel.
Add the custom script
End the script with console.log(…) to test the output of the script
using in ReqEdit setReturnValue(htmlstring) must be used