N1904-TF

Text-Fabric dataset of the Greek New Testament, based on the Nestle 1904 (7th printing) edition.

About this dataset
Transcription
Featureset
Optional features
Viewtypes
Textformats
Syntaxtrees
Tutorial
Latest release

Nestle 1904 GNT - Textformats

Text-Fabric’s data design allows for flexible representation of the corpus text but requires at least one text format to be specified as its default (in this dataset: text-orig-full). During the creation of the dataset, additional formats relevant to this corpus were defined, which are basically based on a subset of the following surface text-related features:

The relation between these features in relation to the surface text is shown in the following image.

The text formats in this Text-Fabric database are identified by unique names that reflect their actual formats. These names follow a structured naming schema, consisting of a string of keywords separated by hyphens (-).

 `what`-`how`-`fullness`

In our database the following keywords are used:

keyword value meaning
what text words as they belong to the text
what lex lexemes of the words
how orig the original Greek script (all Unicode)
how unaccent the original Greek script without accents
how translit transliteration into latin alphabeth
fullness full complete text with text-critical markers
fullness plain complete text without text-critical markers

Not all possible combinations are defined or relevant. The following text-formatting options are defined:

Format Usage Template
lex-orig-plain Lexemes of the Greek surface text {lemma}{trailer}
lex-translit-plain Transliteration of the lexemes of the Greek surface text {lemmatranslit}{trailer}
text-orig-full (default) The Greek surface text in unicode including text-critical markers {before}{text}{after}
text-orig-plain The Greek surface text in unicode {text}{trailer}
text-translit-plain Transliteration of the Greek surface text {translit}{trailer}
text-unaccent-plain The Greek surface text in unicode without accents {unaccent}{trailer}

Each text format is implemented as a template that maps the format to individual features. This mapping can be easily checked using the following command: A.showFormats().

Example

This example illustrates how the different formats in this dataset affect the presentation of Mark 1:1.

# note: node 383782 is of type 'verse' and associated to Mark 1:1 
for formats in T.formats:
    print(f'fmt={formats}\t: {T.text(383782,formats)}')
fmt=lex-orig-plain       : ἀρχή ὁ εὐαγγέλιον Ἰησοῦς Χριστός υἱός θεός.
fmt=lex-translit-plain   : arkhe o euaggelion Iesous Khristos uios theos.
fmt=text-orig-full       : Ἀρχὴ τοῦ εὐαγγελίου Ἰησοῦ Χριστοῦ (Υἱοῦ Θεοῦ).
fmt=text-orig-plain      : Ἀρχὴ τοῦ εὐαγγελίου Ἰησοῦ Χριστοῦ Υἱοῦ Θεοῦ.
fmt=text-translit-plain  : Arkhe tou euaggeliou Iesou Khristou Uiou Theou.
fmt=text-unaccent-plain  : Αρχη του ευαγγελιου Ιησου Χριστου Υιου Θεου.

Character encoding

All Greek text in this Text-Fabric dataset is encoded in Unicode. However, there are specific aspects that may require attention when querying, particularly those involving polytonic accents and “pseudo-characters” like the iota subscript. For a detailed discussion on character encoding, please refer to the documentation here.