Nestle 1904 GNT - Feature: normalized

Feature group	Feature type	Data type	Available for node types	Used by viewtypes
`Orthograpic`	`Node`	`String`	`word` `subphrase` `phrase`	`syntax-view` `wg-view`

Feature description

This feature provides the normalized Greek form of the surface text in the Nestle 1904 Greek New Testament encoded as Unicode. This feature is essential for consistent and accurate lexical analysis.

This feature is also populated for phrase or subphrase, but only if they consist of just one word node.

Feature values

The normalized form of the Greek surface text (see also the ‘notes’ section).

Notes

The following snippet identifies all word nodes where feature text and normalized differ:

Query = '''
w:word 
   w .text#normalized. w
'''

Results = A.search(Query)

This query returns 37182 results. The following table shows the frequency of the top ten differences between feature text and normalized:

feature value text	feature value normalized	frequency
καὶ	καί	8545
δὲ	δέ	2620
τὸ	τό	1658
τὸν	τόν	1556
τὴν	τήν	1518
γὰρ	γάρ	921
μὴ	μή	902
τὰ	τά	817
τοὺς	τούς	722
πρὸς	πρός	670

The relation between the feature values for text, normalized and lemma can be demonstrated using the example from the third word in Matthew 1:2:

        <w ref="MAT 1:2!3"
             ...
             xml:id="n40001002003"
             lemma="ὁ"
             normalized="τόν"
             ...
             unicode="τὸν">τὸν</w>

See also the following related features:

text: Word without punctuations and text-critical signs.
unicode: Word as it appears in the text (in unicode)

Character encoding

All Greek text in this Text-Fabric dataset is encoded in Unicode. However, there are specific aspects that may require attention when querying, particularly those involving polytonic accents and “pseudo-characters” like the iota subscript. For a detailed discussion on character encoding, please refer to the documentation here.

Source description

The normalized feature is taken from the XML attribute normalized of the w (word) tag.

N1904-TF