Hobyte

Migrating My Obsidian Docs

I really like the concept of obsidian and use it to store my ideas and notes. It makes it really easy to connect notes and tag them to keep an overview. Basically, you can add tag and links to other notes everywhere in a note. This can be in the text or in a header that stores additional metadata. In the past I used this to create my own header for notes that contains tags and links for that note. It could look a bit like this:

#code [[linux]] #python #architecture
***

This renders nicely, as the *** are shown as a separating line, so tags/links and the content of the note are nicely separated. But obsidian now introduced bases, database like tables that can be used to view, sort and filter files by their properties. But they can only read the properties of notes if they are included as YAML frontmatter. That is basically a yaml block at the top of the note surrounded by ---:

---
title: Websockets with envoy
tags:
  - networks
  - software
  - distributed systems
source: https://www.envoyproxy.io/docs/envoy/latest/start/sandboxes/websocket
links:
  - https://slack.engineering/migrating-millions-of-concurrent-websockets-to-envoy/
---

This is not the same format as I had, but it is more flexible and contain much more information such as links and the date of creation. So I wanted to migrate all my notes to this format. But how could I do that? I could use an AI tool, but that isn’t reliable all the time and can mess thinks up, so I decided to use the the help of AI and create a python script that will perform the conversion:

  1. Extract the header from the note

  2. Find and extract all tags

  3. Find and extract all links

  4. create the new YAML frontmatter with the links and tags

  5. replace the old header with the frontmatter

And this is what I came up with:

import re
import pathlib

header_regex = r'.*\n\*\*\*'
tag_regex = r'\#\S*'
link_regex = r'\[\[.*\]\]'

source_folder = pathlib.Path('./permanent')
target_folder = pathlib.Path('../notes-new/permanent')

def replace_header(match: re.Match):
  # extract links and tags from old header
  header = match.group(0)
  tags = re.findall(tag_regex, header, re.M)
  links = re.findall(link_regex, header, re.M)

  # Format as a YAML list
  tags_list = "\n".join(f"  - {tag.lstrip('#')}" for tag in tags)
  links_list = "\n".join(f"  - \"{link}\"" for link in links)

  # Create new header
  properties = ("---\n"
                f"title: {note.name}\n"
                f"tags:\n{tags_list}\n"
                f"links:\n{links_list}\n"
                "---\n")
  return properties

target_folder.mkdir(exist_ok=True)

for note in list(source_folder.glob('**/*.md')):
  if note.is_file():
    with note.open() as open_note:
      # read old note
      content = open_note.read()
      # create new header
      new_content = re.compile(header_regex).sub(replace_header, content)
      # replace file content
      with open(target_folder / note.name, "w") as target_note:
        target_note.write(new_content)

This script can be divided into the replace_header and the main part handling all the file stuff. The main part will open all markdown files from my notes and read the content. This will be passed to the replace_header function, that will replace the header. The whole new note with the new header is then written a file with the same name as the original note. The replace_header function is the main logic. It uses regex to first extract the old header and then extract tags and links from it. Then those links and tags are formatted as a YAML list and inserted into a predefined template for the new frontmatter. The old header is then replace with the frontmatter using the regex functionality of python. To prevent failures of the script to ruin my note, I used a different folder for the converted note. This allows me to double check the conversion and fix any errors, before copying the new files back to my notes folder, replacing the old notes.

This script will do most of the work, but it fails at some notes have slightly different formatting or where the header has extra information such as additional links. In these cases, I had to clean them up by hand, which was needed only for a few notes. So now I can use the new bases feature of obsidian and the conversion was mostly done with the script, making it easy and fast.