Text-Fabric dataset of the Greek New Testament, based on the Nestle 1904 (7th printing) edition.
About this datasetFeature group | Feature type | Data type | Available for node types | Used by viewtypes |
---|---|---|---|---|
Sectional |
Node |
String |
word subphrase phrase |
syntax-view wg-view |
The ref
feature provides a unique identifier for each individual word in the corpus.
This feature is also populated for phrase
or subphrase
, but only if they consist of just one word
node.
A compound string indicating book, chapter, verse and sequence number of the word inside the verse formatted as follows:
MAT 1:2!11
This format consists of:
MAT
for Matthew), matching the value of feature bookshort for each book
node.1
)2
)!
symbol, which is the word sequence number within a verse (in this example 11
), matching the value of feature num for each word
node.To extract all components in this feature using Python, the following code snippet can be used:
ref = "MAT 1:2!11" # example content
# Regular expression pattern to match the book, chapter, verse, and position of the word in the verse
pattern = r"(\w+)\s(\d+):(\d+)!(\d+)"
# Using re.match to extract the parts based on the pattern
match = re.match(pattern, ref)
book, chapter, verse, positionInVerse = match.groups()
This first three characters of this feature value are identical to the feature bookshort.
The feature id
contains identical information as the feature ref
, albeit in a different format.
The identifier is based on the XML attribute ref of the w (word) tag.