Text-Fabric dataset of the Greek New Testament, based on the Nestle 1904 (7th printing) edition.
About this datasetFeature group | Feature type | Data type | Available for node types | Used by viewtypes |
---|---|---|---|---|
Orthograpic |
Node |
String |
word subphrase phrase |
syntax-view wg-view |
This feature provides the normalized Greek form of the surface text in the Nestle 1904 Greek New Testament encoded as Unicode. This feature is essential for consistent and accurate lexical analysis.
This feature is also populated for phrase
or subphrase
, but only if they consist of just one word
node.
The normalized form of the Greek surface text (see also the ‘notes’ section).
The following snippet identifies all word nodes where feature text and normalized differ:
Query = '''
w:word
w .text#normalized. w
'''
Results = A.search(Query)
This query returns 37182 results. The following table shows the frequency of the top ten differences between feature text and normalized:
feature value text | feature value normalized | frequency |
---|---|---|
καὶ | καί | 8545 |
δὲ | δέ | 2620 |
τὸ | τό | 1658 |
τὸν | τόν | 1556 |
τὴν | τήν | 1518 |
γὰρ | γάρ | 921 |
μὴ | μή | 902 |
τὰ | τά | 817 |
τοὺς | τούς | 722 |
πρὸς | πρός | 670 |
The relation between the feature values for text, normalized and lemma can be demonstrated using the example from the third word in Matthew 1:2:
<w ref="MAT 1:2!3" ... xml:id="n40001002003" lemma="ὁ" normalized="τόν" ... unicode="τὸν">τὸν</w>
See also the following related features:
All Greek text in this Text-Fabric dataset is encoded in Unicode. However, there are specific aspects that may require attention when querying, particularly those involving polytonic accents and “pseudo-characters” like the iota subscript. For a detailed discussion on character encoding, please refer to the documentation here.
The normalized
feature is taken from the XML attribute normalized
of the w
(word) tag.