Vesuvius Scroll Challenge

Written by:

https://scrollprize.org


Text

https://drive.google.com/drive/folders/1VDogTkf6hmtu_0EfC7UHBXl2T73eHTa9?usp=sharing

Compositing Images

https://drive.google.com/drive/folders/16WvPthwTIkz_lxsgyri9h51a2U0JBMkV?usp=drive_link

Reference Images

https://drive.google.com/drive/folders/12L1qD-I-bcZmjm8BsNndewY1qUhRzoxx?usp=sharing

Crackle Masks

https://drive.google.com/drive/folders/15NvKcMnvLKpOof26ztbSY-Hmtwag14y2?usp=sharing


I present 4 passages from the following main segments:

  1. 20231005123336 (first letters area)
  2. 20230702185753
  3. 20231012184423
  4. 20231106155351 + 20231022170901

Each one also has overlapping segments that are layered on top with various blend modes. The composite image is the one I used for character/word detection. Each composite is made from direct outputs generated programmatically from a modified version of the open sourced firs ink/letter code. I used the segmentation team’s released tif stacks which used the CT data inputs. No manual annotations of character or text were made. All manipulations were made with various blend modes between direct outputs. Mostly multiply, darken, screen, and soft output were used. More details in discussion. 

I’ve also included letters/words detected from additional segments. These appear to have less ink density and to my eye, less legible continuous plausible text. I’ve ordered them from highest confidence to lowest- thus the first 4 are the ones I believe have the highest accuracy but I’ll leave judgment to the papyrologists/review board. 

I’m still working on stuff so may resubmit if I achieve better results. At the very least, I expect to consolidate and send over additional supporting materials over the next few days. 

I’ve put together a list of links to relevant materials below. 

Let me know if you have any issues accessing or opening any of them. 


LINKS

Submission PDF: SUBMISSION_v2.pdf

-Submission DOC: Submission_v2.doc

https://docs.google.com/document/d/13pQKFqdL3WWA_4W8lp2wb3CCJxDTqTyGvXJsF7Nvg0g/edit?usp=sharing

-Github repo: inkception-3d

https://github.com/lucyellu/inkception-3d

-inference_notebook.ipynb

https://drive.google.com/file/d/1kxj5RU0BrZ1svhtvf2NqlwgLYy_TLpDQ/view?usp=drive_link

-inference_stride32.py

https://drive.google.com/file/d/1YHe3BUdu2DgyvZ9m0sncguNKNrZSjg2d/view?usp=drive_link

-letter_grid.ipynb

https://drive.google.com/file/d/1j7a8eN37JYm_RUcBrwgPZUqNfGX55ZY0/view?usp=drive_link

-Google drive downloads: 

-Data

-Scanned images

https://drive.google.com/drive/folders/1fBeRPUH9SE1OzCITMlNZkN3YeNRqNYOt?usp=drive_link

https://drive.google.com/drive/folders/14ejfQ5szzY2sjdXlk74RD4E3eZOp7MWS?usp=drive_link

-NOink

https://drive.google.com/drive/folders/1ADkFHVwC2wANggqMGLqQ81jqCEGtqsCm?usp=drive_link

-Paper

https://drive.google.com/drive/folders/19rH44JwR-PL_auYJbIukXjUellVrCRwB?usp=drive_link

-Greek OCR

https://drive.google.com/drive/folders/1WhBFwV_KCwNVtWDM_RLFhLvuVmLzewQd?usp=drive_link

-Finetuning masks: Training_segments.zip

-Youseff’s checkpoint

https://drive.google.com/file/d/1SAcCUi7tdAnI8tfuygh2s2u3hvfdZan6/view?usp=drive_link

-3D Files

-Blender

https://drive.google.com/drive/folders/14ct5MFSMTutGlv3o7xIlWoPro3LDLgLA?usp=sharing

-Maya

https://drive.google.com/drive/folders/1riLSxISfxWfToJFwAuNLhkIMGDdnnoLQ?usp=sharing

-FBX


CONTENTS

  1. Passages
    1. Detected text 
    2. Possible characters/letters
    3. Possible words/passages
  1. Segments
    1. Segmentation output: tif layers, average surface tif, obj mesh
    2. Public model’s output
    3. Composited Output
  1. Characters
  1. Compositing
    1. Overlapping separate segments
    2. Overlapping pages within segments
  1. Data 
    1. Depth separations (pages/wraps)
    2. Landmarks
    3. Crackle
  1. Machine Learning
    1. Architecture
    2. Inference
    3. Training/Finetuning
  1. Replication
    1. Github repo
    2. Segment outputs (.zip)
    3. 3D scenes/mesh (fbx)
    4. Supplementary photos/videos
  1. 3D Unwrapping, Spatial Segmentation
    1. Non-linear deformers
    2. Manual vertex manipulation
    3. Spiral generation
    4. Damage simulation
    5. 3D slicer revealing of segments
  1. 3D Ink Detection
    1. Manual replication of kaggle model 
    2. 65 tif stack image arrays
    3. Subsurface scattering 
    4. Volume shaders
    5. Mesh to volume
  1. References/Downloads
    1. CT Scans: 14k Slices (masked volume, axial crackle: khartes_chuck)
    2. Segmentation/Segments: 65 Slices (maya, blender, after effects
    3. SAD Scans: Blank Papyrus, Ink/NoInk, Replica fragments/scroll
    4. Airfryer Carbonized Scans

I present passages from the following segments *

-Segment 1: 20231005123336

-Segment 2: 20230702185753

-Segment 3: 20231012184423 (2 passages)

-Segment 4: 20231106155351 + 20231022170901 (20231022170900_superseded)

-Segment 5: 20231106155351

-Segment 6: 20231210121321 + 

-Segment 7: 20231031143852 + 20231022170901 (20231022170900_superseded)

oduction


Vesuvius was more than a machine learning problem to me- presenting challenges across diverse domains ranging from topology to Ancient Greek translations. 

My approach felt more akin to fossil excavation, where I was chipping or brushing away debris to reveal the underlying object- more so than putting things together.

Like digging up dinosaur bones. Gather as much signal as possible, then go in manually to remove noise. Adjusting the model to pick up more signal felt like excavating a whole square meter of dirt. Compositing then felt like blowing or brushing away the clutter. 

This scroll is basically a cylindrical stone tablet. It’s been fossilized from paper to rock. 


3D

Someone mentioned that at the heart of it this is a geometry problem. I instantly agreed, as intuitively I just wanted to get a good grasp of what we were dealing with on a physical/spatial level. 

How big is this thing? How does it look from every angle? I wanted to hold it up to the light and examine it from different views. So one of the first things I set about was to create a digital clone that I could play with in 3D space. 

This involved bringing in the obj meshes of each segment, downsampling it to 1% poly count, and then referencing them into a new scene. I only managed to get up to 10 or so segments to work within one scene. But it was still immensely helpful to see how the pieces link up to one another. 

There are definitely places where the segments overlap, with the geo straight up crashing. 


The vast majority of my work was with Scroll 1. I did do some initial tests with Scroll 1667. I’ve included hand drawn masks of crackle from scroll 4:

340: Layer 57 (mentioned by Luke)

640: Layer 60 (alpha mentioned by Rapushko)

My theory is that the crackles are microtears from the ink as you can see the direction of brush/pen strokes. I bought some reed pens and carbon ink to try writing on papyrus myself, and found myself repeating certain strokes to get certain characters. 

This became especially apparent with longer passages when I found myself seeking shortcuts. The alpha symbol for example tends to follow the star shape. 

You can distinguish certain shapes by follow the trails of the crackle. Similar to the trails left behind by old school plane sky writers. 

There’s a certain orientation to the pattern, and it’s stenciled rotation is a tell-tale sign of ink crackle as opposed to the larger crackle pattern. 

The larger pattern may be larger sheets fracturing. I think the ink may have fused with the paper and then cracked under distress. In any case, the scale of the textures are extremely relevant. 


I tried segmenting with the various tools but didn’t manage to get anything useful. Volume cartographer took forever to spin up and then the software itself was so laggy (perhaps due to xview)

Khartes was better, but I didn’t manage to get vc render to work quite right. 

I decided I may be better off separating out the overlapping segments as I knew segment 852 overlapped with 753. Seg 753 was curiously one of the only ones that had the normals facing outwards. Most were facing inwards, which makes sense as the ink should be on the inside of the scroll. 


I found that using the reverse output as a subtractive mask was helpful in eliminating noise

Youseff-output:

Youseff-output-reversed:

Composite mask:

Notice the artifacts that seems to be from paper bending are materially diminished as they cancel each other out.

Passages

Passage 1: Segment 20231005123336

μα αβϛμ

κμλνκαμκχ

δεικνυονται ημιν ο ε

πι τε πορφυρας και

των ομοιων

πξ πως ανκω

μπουφαν

κυνζο

ϛχρξγυμ οχκ

ψπϛξζωνπν

λαων γηϛκω

μξββνινοθγϛω ​

————————-

But a rite or but a custom

are shown to us those who…

[they] show us the

one on purple and

other similar

how do I recover (how am i doing)

Hunter

unknown

Of the people of the land

————————

are indicated by e

pi te purple and

of the same

how am i doing

jacket

hound

ϛhrxgym ohk

ψπϛξzonpn

people’s land

μξβνινοθγϛω ​

————————-


μα αβϛμ (ma absm): Could be “μα αβεσμός” (ma abesmos), meaning “but a rite” or “but a custom”.

κμλνκαμκχ (cmlnkamkh): Perhaps “κοινολνκαμκχ” (koinolnkamkh), though the meaning is unclear.

δεικνυονται ημιν ο ε (are indicated by e): Likely “δείκνυνται ημίν οι ε” (deiknyntai hemin hoi e), meaning “are shown to us those who…”.

πι τε πορφυρας και (pi te purple and): Appears correct, “πι τε πορφύρας και” (pi te porphyras kai), meaning “both purple and”.

των ομοιων (of the same): Appears correct, meaning “of the similar ones”.

πξ πως ανκω (how am i doing): Could be “πως ανακω” (pos anakō), meaning “how do I recover”.

μπουφαν (jacket): “μπουφάν” is modern Greek for “jacket”. It’s anachronistic for ancient Greek.

κυνζο (hound): Could be “κυνηγό” (kynēgo), meaning “hunter”.

ϛχρξγυμ οχκ (ϛhrxgym ohk): 

ψπϛξζωνπν (ψπϛξzonpn): 

λαων γηϛκω (people’s land): Could be “λαών γης κω” (laōn gēs kō), meaning “of the people of the land”.

μξββνινοθγϛω (μξβνινοθγϛω): 

Passage 2: Segment 20230702185753

λ..νπ.τωφ

.λτρϛρτπτλκτ

πλυνίαροαγ

.τ..πωνπρ

ν.ν.κρ.ο.τ

..υνγριπο

να….τϛ

κ..λρταιν

γ….μαρπλ

ϛ..ρλ.ο.υτ.ι

.πτυ.ποω.ακ

….λ..νρ

.π.ο.τίπ

..ρ….λυοψ

..ω.λαμυψρ

Old– γπλ μαλχπο

New- γπλ μαλχου

ντιπλιοαλ

.π..πγ.κνκ

————————

λ . . νπ . τωφ: “τωφ” could be a form of “τω φ”, meaning “to the”.

λτρ ϛρ τπτλϰτ: Hard to interpret. Could be a series of abbreviations or specialized terms.

π λ υϰία ρο αγ: “υϰία” (hykia) could mean “health” or “well-being”. “ρο αγ” is unclear.

τ π ϰ π ρ: Could be parts of words or abbreviations.

ν ν .ϰ ρ. ο . τ: ν ν” could be repetitions or part of a larger word.

υν γριπο: “υν” might be part of a word. “γριπο” (gripo) is not a recognizable Greek word but might be a name or specific term.

ν α . . . τϛ: “ν α” could be the start of a word.

κ λ ρ τ α ι ν: Could be initials or abbreviations. Hard to form a coherent word.

γ . . . μ ραπλ: “ραπλ” is not a standard Greek word but could be a part of a larger term or a specific name.

ϛ ρλ .ο. υ τ ι: “ρλ” is unclear. Could be part of a name or term.

πτυ ποω αϰ: “πτυ” could be a form of “πτύω”, meaning “to spit”. 

λ . . ν ρ:

π ο τ ί π: 

ρ . . λ υ οψ:

ω λ αμυ ψρ: “αμυ” could be part of “αμυνω” (amyno), meaning “defend”. The rest is unclear.

γ π λ μαλχ π ο: “μαλχ” could be a form of “μαλαχ”, a variant of “μαλακός” meaning “soft” or “gentle”.

ντι πλ ιο α λ: 

π γ κνκ:


Passage 3: Segment 20231012184423 (left)

 (1)      Λ     

Letters from the LEFT column:

Λ 

Ρ..ΕΙΑΒΡ.Μ.Ρ

ΗΥΠΕΡ.Η..ΖΝ.

.ΕΠΑΡΑΤ.ΑΡ.Π.    

..Ν.ΤΡΝ.ΗωΊΝΑΤΑ  

ΤΡ.Τ.ΥΓ..Τ. ΥΨ 

ϟΝ ιι.Τ.ΡΝ..ΤΡΠ

ΡΚΤΟΑΡ….Τ.ΑΜ

.Γ…Α.ΟΝΑΤΑ..Γ

ΤΕ…….Υ..Π….ϟ

ΙΤΟΥΙΓΤ ωΙ……..

..Χ…………  

..Π.ιι.Γ….ΙΗΝ    

ΓΥΜΗΕΕΙω..ΙΑ.Ρ……

.ΝΙΑωΙ……..Γ

ΔΙΗΙΜΑΡΙ.ιιΤ..

ΤΟω………….

ΤΟΥΡωΑ..ΑΥΓϟ

————————-

Spellchecked

Λ 

Ρ..ΕΙΑΒΡ.Μ.Ρ

ΗΥΠΕΡ.Η..ΖΝ.

.ΕΠΑΡΑΤ.ΑΡ.Π.    

..Ν.ΤΡΝ.Ή ΊΝΤΑ  

ΤΡ.Τ.ΥΓ..Τ. ΥΨ 

ϟΝ ιι.Τ.ΡΝ..ΤΡΠ

ΕΚΤΟΡ….Τ.ΑΜ

.Γ…Α.ΟΝΑΤΑ..Γ

ΤΕ…….Υ..Π….ϟ

ΡΙΤΟΥΙΤ ωΙ……..

..Χ…………  

..Π.ιι.Γ….ΙΗΝ    

ΓΥΜΗΕΕΙω..ΙΑ.Ρ……

.ΝΙΑωΙ……..Γ

ΔΙΗΙΜΑΡΙ.ιιΤ..

ΤΟω………….

ΤΟΥΡνΑ..ΑΥΓϟ


Passage 4: Segment 20231012184423 (right)

Letters from the RIGHT column:

Τ

ΤΜϚΥΡΡΚϚΜ

ΑΓΡCΤΛΜΑ

ΠΡΩΛCΠΔΝΜΛΝΞΛ

ΝΠΕΡΥΚΩΔΤ

ΡΩΜΖΟΓΥΤΥΖ

ΕΝΡΜΊΓΖΡΙΚΥΤϚΚΤ

ΚΝΑΓΛΑΝΜΡΙΤ

ΓΑΩΦΡΖΠΛ

ΠΤΛΥ

ΓΡΑΓΔΡΗϚΡ

ΧΚΜΛΙΡΡΤΗ

ΝTΤΝΤΠΡΚ

ΔΠCΑΤCΠΤΛ

ΡΖΩ

ΖΜΤΩΜΝΟΝΤΜ

ΡΣΣΜΤC

————————-

Λ [1/1]: 

ΤμϚΥΡΡκϚμ [9/9]: 

ΑΓΡCΤλμΑ [7/7]: 

  “ΑΓΡ” could be part of “ἀγρός” (agros, meaning field), but the presence of “C” is untypical in Greek.

ΠΡωλCΠδΝΜλΝξλ [11/11]:

  “ΠΡ” might suggest “πρός” (pros, meaning towards).ΝΠεΡΥϰωδΤ [9/9]: 

ΡωμζΟΓΥΤΥΖ [10/10]: 

εΝΡμΊΓζΡΙΚΥΤϚϰΤ [12/12]: 

ΚΝΑΓλΑΝμΡΙτ [12/12]: 

ΓΑωΦΡΖΠλ [8/8]:

ΠΤλΥ [4/4]: 

ΓΡΑΓΔΡΗϚΡ [7/7]: 

ΧϰμλΙΡΡΤΗ [7/7]: 

ΝtΤΝΤΠρκ [5/5]: 

ΔΠCΑΤCΠτλ [6/6]: 

Ρζω [3/3]: 

ζμτωμΝΟΝΤμ [8/8]:

ΡςςμΤC [6/6]:

————————-

[1/1]    

 (2)      ΤμϚΥΡΡκϚμ     [9/9]     

 (3)      ΑΓΡϚΤλμΑ     [7/7]    

 (4)      ΠΡωλϚΠδΝΜλΝξλ      [11/11]       

 (5)      ΝΠεΡΥϰωδΤ     [9/9]    

 (6)      ΡωμζΟΓΥΤΥΖ     [10/10]      

 (7)      εΝΡμΊΓζΡΙΚΥΤϚϰΤ      [12/12]       

 (8)      ΚΝΑΓλΑΝμΡΙτ     [12/12]         

 (9)      ΓΑωΦΡΖΠλ      [8/8]     

 (10)     ΠΤλΥ     [4/4]  

 (11)     ΓΡΑΓΔΡΗϚΡ     [7/7]      

 (12)     ΧϰμλΙΡΡΤΗ     [7/7]         

 (13)     ΝtΤΝΤΠρκ      [5/5]          

 (14)     ΔΠϚΑΤϚΠτλ     [6/6]

 (15)     Ρζω      [3/3]       

 (16)     ζμτωμΝΟΝΤμ     [8/8]      

 (17)     ΡςςμΤϚ     [6/6]         

 (18)     Ϛ   [1/1]      

156/180

156/140

λ – Could be an abbreviation or initial for a word like “λόγος” (logos, meaning “word” or “reason”).

τμϛυρρκϛμ – Perhaps a distorted form of “τεμάχισμα” (temachisma, meaning “fragment” or “piece”).

αγρcτλμα – Might be a version of “αγρότημα” (agrotima, meaning “farm” or “estate”), with ‘c’ being a corruption.

πρωλcπδνμλνξλ – Could hint at “προδιαλέξω” (prodialexo, meaning “to preselect” or “choose in advance”), with some letters being extraneous.

νπερυκωδτ – Might be a jumbled version of “υπέρκοσμος” (yperkosmos, meaning “supernatural” or “transcendental”).

ρωμζογυτυζ – Resembles a garbled form of “ζωγράφος” (zografos, meaning “painter” or “artist”), with extra letters.

ενρμίγζρικυτϛκτ – Could imply “μυστικιστής” (mystikistes, meaning “mystic” or “occultist”), with considerable distortion.

κναγλανμριτ – Resembles a corrupted form of “καλλιγραφία” (kalligrafia, meaning “calligraphy”).

γαωφρζπλ – Might suggest “φωγράπης” (fographes, a distorted form of “photographer”).

πτλυ – Could be a shortened or corrupted form of “πτυχίο” (ptychio, meaning “degree” or “diploma”).

γραγδρηϛρ – Resembles a distorted version of “γραφείο διαχείρισης” (grafeio diacheirisis, meaning “management office”).

χκμλιρρτη – Might hint at “χειρουργός” (cheirourgos, meaning “surgeon”), with some letters being extraneous.

νtτντπρκ – Could imply “ντροπή” (ntropi, meaning “shame” or “embarrassment”), with extra letters.

δπcατcπτλ – Resembles a corrupted form of “διπλωματικός” (diplomatikos, meaning “diplomatic”).

ρζω – Might suggest “ρίζα” (riza, meaning “root”).

ζμτωμνοντμ – Could be a jumbled version of “ζωντανός μύθος” (zontanos mythos, meaning “living myth”).

ρσσμτc – Might hint at “μυστήριο” (mysterio, meaning “mystery”), with considerable distortion.


Lowercase, spell checked

λ/τ

τμϛυρρκϛμ

αγροκτημα

προλ c πδ νμ λνξ λ

νπερυκωδτ

ρωμζογυτυζ

ενρμίγζρικυτϛκτ

κναγλανμριτ

γαωρ ζπλ

πτλυ

γραγδρηϛρ

χκμλιρρτη

ντντ πρκ

δπcατcπτλ

ρζω

ζυμωνοντας

ρσσ μτc

————————-

l

τμϛυρρκϛμ

farm

prol c pd nm lnx l

close-up

romzogytyz

enrmižrikytϛkt

knaglamrit

gaor zpl

pl

register

hkmlirti

dnd prk

dpcatcptl

I’m sorry

fermenting

rss mtc


lowercase, No spaces

λ/τ

τμϛυρρκϛμ

αγρcτλμα

πρωλcπδνμλνξλ

νπερυκωδτ

ρωμζογυτυζ

ενρμίγζρικυτϛκτ

κναγλανμριτ

γαωφρζπλ

πτλυ

γραγδρηϛρ

χκμλιρρτη

νtτντπρκ

δπcατcπτλ

ρζω

ζμτωμνοντμ

ρσσμτc

————————-

l

τμϛυρρκϛμ

farm

prolcpdnmlnxl

close-up

romzogytyz

enrmižrikytϛkt

knaglamrit

geofrzpl

pl

register

hkmlirti

tttdprk

dpcatcptl

I’m sorry

zmtomnondm

rssmtc

Alternate reading based on guesses from community (discord)

Λ/Τ

ΤΜϚΥΡΡΚϚΜ

ΑΓΡCΤΛΜΑ

ΠΡΩΛCΠΔΝΜΛΝΞΛ

ΝΠΕΡΥΚΩΔΤ

ΡΩΜΖΟΓΥΤΥΖ

ΕΝΡΜΊΓΖΡΙΚΥΤϚΚΤ

ΚΝΑΓΛΑΝΜΡΙΤ

ΓΑΩΦΡΖΠΛ

ΠΤΛΥ

ΓΡΑΓΔΡΗϚΡ

ΧΚΜΛΙΡΡΤΗ

ΝTΤΝΤΠΡΚ

ΔΠCΑΤCΠΤΛ

ΡΖΩ

ΖΜΤΩΜΝΟΝΤΜ

ΡΣΣΜΤC

C


Uppercase, No spaces

Λ

ΤΜϚΥΡΡΚϚΜ

ΑΓΡCΤΛΜΑ

ΠΡΩΛCΠΔΝΜΛΝΞΛ

ΝΠΕΡΥΚΩΔΤ

ΡΩΜΖΟΓΥΤΥΖ

ΕΝΡΜΊΓΖΡΙΚΥΤϚΚΤ

ΚΝΑΓΛΑΝΜΡΙΤ

ΓΑΩΦΡΖΠΛ

ΠΤΛΥ

ΓΡΑΓΔΡΗϚΡ

ΧΚΜΛΙΡΡΤΗ

ΝTΤΝΤΠΡΚ

ΔΠCΑΤCΠΤΛ

ΡΖΩ

ΖΜΤΩΜΝΟΝΤΜ

ΡΣΣΜΤC

C

Λ [1/1]: 

ΤμϚΥΡΡκϚμ [9/9]: “Ϛ” is an archaic letter (stigma), might be a “σ” (sigma) in some cases.

ΑΓΡCΤλμΑ [7/7]: “C” could be a “Γ” (gamma) or “Σ” (sigma).

ΠΡωλCΠδΝΜλΝξλ [11/11]: “C” could be “Σ”, “δΝ” might be a misreading, possibly “ΔΝ” (Delta-Nu) or “ΑΝ” (Alpha-Nu).

ΝΠεΡΥϰωδΤ [9/9]: “ϰ” is a kappa, “ωδ” might be a diphthong or two separate vowels “ο” and “δ”.

ΡωμζΟΓΥΤΥΖ [10/10]: “ζ” is not commonly found in the middle of Greek words; could be a “Σ”.

εΝΡμΊΓζΡΙΚΥΤϚϰΤ [12/12]: Several unusual combinations here, “ζ” could be “Σ”, “Ϛ” might be “Σ”, “ϰ” could be “Κ”.

ΚΝΑΓλΑΝμΡΙτ [12/12]: “τ” is not a Greek letter, it could be a misreading of “Τ” (Tau).

ΓΑωΦΡΖΠλ [8/8]: “ΦΡΖ” is an unusual sequence; “Ζ” might be a “Σ”.

ΠΤλΥ [4/4]: Appears legible, though “Υ” at the end of words is uncommon.

ΓΡΑΓΔΡΗϚΡ [7/7]: “ΔΡΗϚ” is an unusual sequence, “Ϛ” might be “Σ”.

ΧϰμλΙΡΡΤΗ [7/7]: “ϰ” is a kappa, “μλ” could be a misreading if it’s meant to be a single letter.

ΝtΤΝΤΠρκ [5/5]: “t” is not Greek; it could be a misreading of “Τ”.

ΔΠCΑΤCΠτλ [6/6]: “C” could be “Σ”, “τ” might be “Τ”.

Ρζω [3/3]: “ζ” is zeta; its placement is unusual in this sequence.

ζμτωμΝΟΝΤμ [8/8]: “ζ” could be “Σ”, “τω” could be a diphthong.

ΡςςμΤC [6/6]: “ς” is a final sigma; it should not appear in the middle of a word, “C” could be “Σ”.

C [1/1]: “C” is not Greek; could be a “Σ”.

————————-

Passage 5: Segment 20231106155351_inference_youssef-test

(1)      . . . . Α.  .  .β . .  .Γ ρ        [4/11]    

(2)      . . . .λ .   βΔ. . . . .ΓΟ        [5/11]    

(3)      . ε . . . λ.  .  . . . .ς             [3/9]    

(4)      ϰ  . . . . .  ορ  .  . .  .          [3/9]    

(5)      Μ  . . . . .  .  .  . . .             [1/9]    

(6)      . . .ς . .  .  .  . . . .              [1/10]    

(7)      . . . . .  .  .  .  .                   [0/9]    

(8)      . ε. ψ. .  .ψω  .  .ω            [5/10]    

(9)      λ   πωΙ ϰ ρ ϰ Γ . . .           [8/11]    

(10)     . . . . . ω   .  . .                  [1/8]    

(11)     . . . . .  .  ω  . .                   [1/8]    

(12)     . ϰ.ω Αζ χ . ω .  . .           [6/10]    

(13)     Ψω  β Ν π Α π .              [7/8]    

(14)     Φωυφζ    Δ Τω. .            [8/9]    

(15)     . .ϰ  ςΑΜΨ . .ψ               [6/10]    

(16)     .π ρ.ω  ω  .βΜ  χ             [7/10]    

(17)     π Φ β   . .   ρ  .  .             [4/9]    

(18)     π . . . . π .  .  .                  [2/9]   

180/140

180/216                                                                              

————————-

Scroll 4 – 1667

Crackle 

Characters

Lowercase letters from the Sostratos’s dataset

ι – ι (Iota)

ω – ω (Omega)

α – α (Alpha)

υ – υ (Upsilon)

lunate sigma (ϲ) (Sigma) 

koppa (Ϙ) [not commonly used in modern Greek phonetics]

Α – α (Alpha)

Β – β (Beta)

Γ – γ (Gamma)

Δ – δ (Delta)

Ε – ε (Epsilon)

Ζ – ζ (Zeta)

Η – η (Eta)

Θ – θ (Theta)

Ι – ι (Iota)

Κ – κ (Kappa)

Λ – λ (Lambda)

Μ – μ (Mu)

Ν – ν (Nu)

Ξ – ξ (Xi)

Ο – ο (Omicron)

Π – π (Pi)

Ρ – ρ (Rho)

Σ – σ/ς (Sigma)

Τ – τ (Tau)

Υ – υ (Upsilon)

Φ – φ (Phi)

Χ – χ (Chi)

Ψ – ψ (Psi)

Ω – ω (Omega)

Frequency of characters of the txt file of Greek text:

93201 ; Α 

88976 ; Ι 

87344 ; Ε 

82162 ; Ο 

73875 ; Ν 

67967 ; Τ 

57473 ; Σ 

40552 ; Υ 

32029 ; Η 

28600 ; Ρ 

27995 ; Κ 

27121 ; Ω 

26666 ; Π 

25352 ; Μ 

23997 ; Λ 

21515 ; Δ 

14917 ; Γ 

11154 ; Θ 

7107 ; Χ 

6460 ; Φ 

3410 ; Β 

2656 ; Ξ 

1494 ; Ζ 

994 ; Ψ 

543 ;  

31 ; ι 

16 ; „ 

6 ; ω 

3 ; α 

3 ; υ 

3 ; / 

2 ; ⋯ 

2 ; \ 

1 ;  

1 ; † 

If the symbol resembles a backwards gamma (Γ), it could possibly be:

A Tau (Τ): Depending on the handwriting, the tau can sometimes look quite different from its standard form, especially in cursive or quicker scripts where the horizontal bar does not extend fully across.

An Iota with an accent: Iotas can be adorned with various accents in Greek; however, they are usually quite distinct in appearance from gammas and typically would not be mistaken for a tau.

A Stigma (Ϛ): This letter, which represents the combination of sigma (Σ) and tau (Τ), can sometimes appear similar to a backwards gamma, depending on the script.

A Koppa (Ϙ): An archaic Greek letter that was used in some regions and can sometimes appear similar to a backwards gamma.

Changing from miniscule to majuscule (lowercase to uppercase) seems to change the definition given by google translate slightly. It seems the lowercase is best for communicating the letter, but some characters are better represented by the uppercase. This is mostly noticeable for UPSILON (Y in uppercase, U in lowercase) and NU (N in uppercase, v in lowercase)

δεικνυονται ημιν ο ε 

πι τε πορφυρας και

των ομοιων

are indicated by e

pi te purple and

of the same

ΔΕΙΚΝΥΟΝΤΑΙ ΗΜΙΝ Ο Ε

ΠΙ ΤΕ ΠΟΡΦΥΡΑΣ ΚΑΙ

ΤΩΝ ΟΜΟΙΩΝ

SHOWING E

PIT TE PORPHYRAS AND

OF S

Characters from paper (link)

Azure custom vision model labeling     Drawn characters 

Compositing

Segment 1: 20231005123336




Segment 2: 20230702185753


Segment 3: 20231012184423

Segment 4: 20231106155351 + 20231022170901 (20231022170900_superseded)

Replication Instructions:

# Vesuvius Scroll Challenge – Ink Detection

https://github.com/lucyellu/inkception-3d

This repository contains training and inference code based on an I3D architecture to detect ink from image stacks.

The easiest way is to clone the repo and open the jupyter notebooks in a colab environment. Then simply run the cells of interest. 

“`

git clone https://github.com/lucyellu/inkception-3d.git

“`

Otherwise- to install and run locally:

### Set up a python environment. Tested with python 3.10

“`

conda create –name python310 python=3.10
conda activate python310git clone https://github.com/lucyellu/inkception-3d.git
cd inkception-3d
pip install -r requirements.txt

“`

note the following packages need to be installed

“`

pytorch-lightning   

typed-argument-parser   

segmentation_models_pytorch   

albumentations   

warmup_scheduler   

“`

#if you run into cudaa or other compatibility issues, try installing with no version requirements

“`

pip install -r requirements_noversions.txt

“`

## Download Data

Download the segments that you want to inference or train with. ([paths](http://dl.ash2txt.org/full-scrolls/Scroll1.volpkg/paths/)).   

Submission segments include: 

    20230702185753 

    20231012184423 

    20231005123336 

    20231106155351

    20231022170901

Place the downloaded segment folders in the inkception-3d folder:

    Place inference segments in /eval_segments

    Place training segments in /training_segments

Make sure each {segmentid}_mask.png and {segmentid}_inklabel.png (if training) file is in its appropriate segment folder.

### Inference

Place the segment data you want to inference into ./eval_segments or adjust accordingly.

The inference_stride32 script runs a checkpoint model based on inception-3d. It saves progress images in a folder designated in outputs path at 2% intervals. Adjust as needed. 

Arguments found within the InferenceArgumentParser class.

Download public model checkpoint [here](https://drive.google.com/file/d/1fAGZbVPHW6q1hNiI2E2NKzf6TyELzOC4/view?usp=sharing

“`

python inference_stride32.py –segment_id 20230702185753 –model_path ‘/content/drive/MyDrive/inkception-3d/models/valid_20230827161847_0_fr_i3depoch=7.ckpt’

“`

###

Some example outputs found at [Segment Browser](https://vesuvius.virtual-void.net/

### Training

Adjust the CFG class in 64x64_256stride_i3d.py

These are the parameters you can adjust per inference or training run. 

I found the default backbone setting to be different for the inference (efficientnet0) and training files (set to resnet) so make sure you select the one you want. 

“`

python 64x64_256stride_i3d.py


Letter Grid

This tool is meant to assist in letter or character detection.

User can input characters into each cell which then updates dynamically for each instance of that symbol.So if you decide that an epsilon should actually be a theta, you can update all instances of that symbol. 

Click on ‘Copy’ to print the row or entire grid to output which you can then paste into the Greek word detector.

Currently the grid is pre-populated with the characters from the first ink/letters section of Scroll 1.

δεικνΥοντΑι ημινοε 

πι τε πορφΥρ Ας κΑι π 

των ομ οιων κΑ

“`

#Scroll diagram

Ortho views of each segment

##in progress

#Coordinates

UV and XYZ of segments


Set 753

Letters start at 

UV

2339, 11917

XYZ

3285, 5203, 13433

Ends at

14824, 1868

4555, 2217, 354

Middle 

8297, 6700 

4203, 3701, 7251


Seg 423

Middle

11357, 7652

2956, 3521, 6854

Passage 1 starts at

14841, 831

3558, 4957, 12941

Passage 1 ends at

21891, 14441, 

3131, 2511, 924


Seg 336

Middle

15377, 5420

2774, 3908, 8463

Passage starts at

15130, 1530

235, 4944, 11759

Ends at

23086, 9882

2383, 2350, 5118


Seg 901

Middle

17440, 3830

2860, 4252, 10184

Passage starts at

16595, 1046

3272, 5084, 12665

Pages

20 to 26 middle top clear lambda (upside down v)

30 to 46 ink

, 30, 44, 


Seg 351

Middle 

8166, 5170

3244, 2157, 4977

Passage starts at

5154, 462

4037, 3346, 9015

Data

Some sample images:

Discussion

I tried fine-tuning but found the public/open sourced model (youseff’s valid7.ckpt) to be quite effective at detecting crackle (which was the only signal I wanted to feed it).

My intuition is that drawing masks is dangerous and needs to be done with great care. I thought risk of human hallucination of letters and thus injection of false data outweighed the incremental gains. 

I’m really grateful for the community and all the open source tools that have been built. Rapushko’s contribution of chartres and segment masks was really helpful. More so as a point of reference than actually training data as I didn’t feel confident enough in all the masks to  keep grinding/adjusting training parameters. 

I agreed with Stephen’s changes to mask 20230929220924, 

https://discord.com/channels/1079907749569237093/1108134343295127592/1181702053479841873

which led me to believe that more masks could use a closer look. The time I spent staring at the layers and masks was outweighing any increase of accuracy for me- diminishing returns so decided to put finetuning on hold to pursue other ideas. 

I actually found more success with simple data manipulation and adjusting the inference code directly using the public checkpoint. 

I found it interesting that setting the reverse = 0 line within the inference script had such drastic effects. You might expect backwards or upside down letters, but the output would be completely different. 

This led me to believe that the model is using wider context of nearby layers. The papyrus patterns are usually 10 to 15 layers, while the crackle would usually be heaviest in 3 to 6 layers. 

It also holds depth value within the greyscale values. You can tell the scale of certain artifacts are adjusting with how close it is to the scanner. 

I spent a fair amount of time trying to get accurate dimensions of the scroll in relation to the data. We’re told that scroll 1 was scanned at a 8um resolution. So each tif slice is presumably 8um. I assumed that the 65 tif stacks were different from the 14.5k volumes- because each segment may have different dimensions in surface area and thickness- yet all have 65 tifs exported. This implies that the thickness of each individual tif has an elastic thickness depending on the particular segment from volume cartographer. 

Discussions within the discord seemed to hint that the reverse parameter was flipping or transforming the images individually

*insert discord link

As a sanity check, I wrote some janky code to manually reverse the tif layers myself. 

*insert colab link

This produced the same result as the youseff-output-test-reversed as hosted by jrrudolph’s Segment Browser. 

The model is using the start.idx and end.idx to construct a 3D array and the order that they are positioned really matters. When reversed, the depth map is backwards with each individual tif flipped. 

Imagine an image where there are two dots. One is larger than the other. This implies that the larger one is closer to the observer. When these images are positioned in a reverse order, that scale information is essentially scrambled. 

Architecture

I used the kaggle and open sourced model as a starting point, which mostly uses efficientnet-b0 as a backbone. It’s architecture is based on inception-3d; a CNN architecture  that is especially effective at video processing and action/movement detection. 

I’ve tried various training runs but results have been mixed so I focused on data manipulation. Each 65 tif stack seems to have multiple papyrus pages, and thus potential for multiple pages of ink. 

You can see the horizontal fibers disappear as you scrub through the layers. Each page seems to be roughly 10-16 pages. The ink seems to mostly lie within 3-6 layers on top of that. 

Altering the start and end idx seems to yield different letters. Reversing the order of images also has huge impact. 

This implies that there is depth information within each tif. 

Reverse order

Confirmed that the reverse is referring to the order of images and not some other form of manipulation or transform to the data. Also generated negative NO INK training data with the fragments (ground truth for ink/no ink available)

DRAFT

Scroll 1

Dimensions of unwrapped scroll likely to be

30 ft to 60 ft

due to

  • radius of scroll (cylinder)
  • thickness of papyrus
  • max 360 segments in horizontal slices (14k tif) (based on page thickness and diameter)
  • Historically, scrolls do not exceed 30 pages (900cm)

Methodology

Data Collection

Took backlight photographs of papyrus with ink/noink and befre/after carbonization in an airfryer

Processed frag1 segments with lightroom and python (reverse image sequence)

Created OBJ models of scroll through 3D slicer and maya

Downloaded data from servers and community resources

-segment browser

Cropping segments to only text areas (masked out margins)

Resetting segments to have center focus line by line

peripheral inference seems to suffer (noted by kaggle winner 5 found improvement with throwing away edges

Data Preprocessing

I inferenced various segments with different layer combinations including:

-01 to 64 (entire stack)

-32 to 34 (middle)

-20 to 50 (segment browser)

-26 to 36

-01 to 21

-49 to 65

Isolating papyrus pages from one another seems to help differentiate ink. 

Early experiments include taking kaggle dataset (mostly Fragment 1) to match the histogram min/max levels of the segment tif stacks (64 tif). The 53kev segments created by Hari Seldon were curiously not picking up ink with public model. Likely due to training data limited to scroll 1. Differences in processing scan data/calibration.


Reduced polygon mesh of segment obj exports from volume cartographer

90% twice

400k to 4k polygon/triangle count

Modeled estimated dimension flat unrolled scroll

Applied deformers to roll it, then set keyframes to animate it

Then copied those keyframes in reverse order to the rolled scroll

Rolled scroll segments needed to be mesh edited to have break in overlapping pages

Model Training

google colab with t4 and a100

1050 ti local card

Did finetuning with youseff’s stride script modified to accept *new data 

Then used various stride, size, batchsize to get various outputs. 

Hallucination Mitigation

Some basics: 

Testing model on unseen data

Breaking up grids to below letter/character resolution- no full letters

Rotation/cropping of images

Due to the inaccessibility of the scroll and inability to get ground truth- seems to me that hallucination is unavoidable. It’s just where in the pipeline you want to introduce it. As I was trying to decipher the outputs, I found myself hallucinating greek characters all around me- my marble tile, carpet patterns- I started seeing upsilons and alphas everywhere. 

I tried to get a sense of what plausible text might look like and tried to establish some ground truth with the exposed fragments or existing manuscripts. Even given perfect ink detection (these are visible letters/characters on exposed papyrus) – it can be hard to pick out words/definitions, let alone whole sentences and passages from deformed and digitally unrolled, then black box machine learning visualized. 


One can often assess the accuracy needed with the number of operations done. Considering that this scroll was first created over 2000 years ago, the number of input/output cycles this object has to go through in order to be recovered is quite astounding. A cylinder of carbon that’s arranged just so- contains intelligence from our ancestors. Information from a different time. Communication that’s somehow survived the thread of existence for millenia. A message from the past. This scroll has time traveled further than anything like it (so far). 

Letters from people just like us, from a world so distant and mysterious, we can only imagine

  • different xray scanning parameters
  • different xray image processing parameters
  • different file formats
  • different resolutions
  • different sequencing (clockwise, reverse stacks)
  • different interpretations of characters/words

a lot of operations

A lot of opportunities for errors to accumulate

Decreasing the amount of operations seemed critical. 

CITATIONS

-Papers

-stephen’s

-seales

-marzia

-thalia (ocr)

-Greek characters:

https://kyle-p-johnson.com/assets/most-common-greek-words.txt

https://lme.tf.fau.de/competitions/2023-competition-on-detection-and-recognition-of-greek-letters-on-papyri

-Scanner:

https://hackaday.io/project/163791-x-ray-ct-scanner

-Machine learning:

https://github.com/JamesDarby345/segment-anything-vesuvius

https://github.com/WongKinYiu/yolov7

-Models: 

https://cloud.google.com/tpu/docs/inception-v3-advanced

https://github.com/tensorflow/tpu/tree/master/models/experimental/inception

-Characters:

http://www.riedlberger.de/titivillus_instructions.pdf

https://sourceforge.net/projects/gimagereader

https://github.com/manisandro/gImageReader

https://www.i2ocr.com/free-online-greek-ancient-ocr

https://docs.google.com/drawings/d/13bNuYv3XiQPZIBk7D-QcfMxLIJiWfAYdTF1foSJ9LbY/edit

-Simulating Infrared Imaging

-ithaca

@article{asssome2022restoring,

  title={Restoring and attributing ancient texts with deep neural networks},

  author={Assael*, Yannis and Sommerschield*, Thea and Shillingford, Brendan and Bordbar, Mahyar and Pavlopoulos, John and Chatzipanagiotou, Marita and Androutsopoulos, Ion and Prag, Jonathan and de Freitas, Nando},

  journal={Nature},

  year={2022}

}

-pythia

@inproceedings{assael2019restoring,

  title={Restoring ancient text using deep learning: a case study on {Greek} epigraphy},

  author={Assael, Yannis and Sommerschield, Thea and Prag, Jonathan},

  booktitle={Empirical Methods in Natural Language Processing},

  pages={6369–6376},

  year={2019}

}

THV

https://www.sciencedirect.com/science/article/abs/pii/S1350449516302456

https://github.com/blirs/BLIRS

-Inception 3D

-Unet

-Community

luke’s repo/write up

casey’s blog post

youseff’s model

sam masked volumes

segment browser outputs

discord:

-reversing (clockwise/anticlockwise)

-letter sizing

-papyrus margins, order, thickness

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.