Text-Fabric dataset of the Greek New Testament, based on the Nestle 1904 (7th printing) edition.
About this dataset| Feature group | Feature type | Data type | Available for node types | Used by viewtypes |
|---|---|---|---|---|
Sectional |
Node |
String |
word subphrase phrase |
syntax-view wg-view |
The ref feature provides a unique identifier for each individual word in the corpus.
This feature is also populated for phrase or subphrase, but only if they consist of just one word node.
A compound string indicating book, chapter, verse and sequence number of the word inside the verse formatted as follows:
MAT 1:2!11
This format consists of:
MAT for Matthew), matching the value of feature bookshort for each book node.1)2)! symbol, which is the word sequence number within a verse (in this example 11), matching the value of feature num for each word node.To extract all components in this feature using Python, the following code snippet can be used:
ref = "MAT 1:2!11" # example content
# Regular expression pattern to match the book, chapter, verse, and position of the word in the verse
pattern = r"(\w+)\s(\d+):(\d+)!(\d+)"
# Using re.match to extract the parts based on the pattern
match = re.match(pattern, ref)
book, chapter, verse, positionInVerse = match.groups()
This first three characters of this feature value are identical to the feature bookshort.
The feature id contains identical information as the feature ref, albeit in a different format.
The identifier is based on the XML attribute ref of the w (word) tag.