N1904-TF

Text-Fabric dataset of the Greek New Testament, based on the Nestle 1904 (7th printing) edition.

About this dataset
Transcription
Featureset
Optional features
Viewtypes
Textformats
Syntaxtrees
Tutorial
Latest release

Nestle 1904 GNT - Feature: normalized

Feature group Feature type Data type Available for node types Used by viewtypes
Orthograpic Node String word subphrase phrase syntax-view wg-view

Feature description

This feature provides the normalized Greek form of the surface text in the Nestle 1904 Greek New Testament encoded as Unicode. This feature is essential for consistent and accurate lexical analysis.

This feature is also populated for phrase or subphrase, but only if they consist of just one word node.

Feature values

The normalized form of the Greek surface text (see also the ‘notes’ section).

Notes

The following snippet identifies all word nodes where feature text and normalized differ:

Query = '''
w:word 
   w .text#normalized. w
'''

Results = A.search(Query)

This query returns 37182 results. The following table shows the frequency of the top ten differences between feature text and normalized:

feature value text feature value normalized frequency
καὶ καί 8545
δὲ δέ 2620
τὸ τό 1658
τὸν τόν 1556
τὴν τήν 1518
γὰρ γάρ 921
μὴ μή 902
τὰ τά 817
τοὺς τούς 722
πρὸς πρός 670

The relation between the feature values for text, normalized and lemma can be demonstrated using the example from the third word in Matthew 1:2:

        <w ref="MAT 1:2!3"
             ...
             xml:id="n40001002003"
             lemma="ὁ"
             normalized="τόν"
             ...
             unicode="τὸν">τὸν</w>

See also the following related features:

Character encoding

All Greek text in this Text-Fabric dataset is encoded in Unicode. However, there are specific aspects that may require attention when querying, particularly those involving polytonic accents and “pseudo-characters” like the iota subscript. For a detailed discussion on character encoding, please refer to the documentation here.

Source description

The normalized feature is taken from the XML attribute normalized of the w (word) tag.


Browse all features by name, node type, data type, feature group or feature type.