> DATE: Sun 12 Jan 2025 15:51 By: konsthol@pm.me

# Simple way to extend yt-dlp

Lots of people use yt-dlp either directly or indirectly through mpv. 
It's a powerful tool that acts as a website scraper and it supports 
thousands of websites. The website its mostly used for is like the 
name suggests YouTube. Now, YouTube is a great resource but usage 
through the website is quite unpleasant so lots of people opt out to 
use alternative frontends like Invidious or Piped. Lots of times you 
just want to use mpv to stream a YouTube video by providing the link 
like:

> mpv https://youtube.com/watch?v=[VideoID]

That works like a charm, but what happens when you provide a link of 
an alternative frontend? Well, it translates it to the aforementioned 
format in order to work. But there are so many instances of Invidious 
and Piped, so how does it know what to do? That was my question as 
well since I use a self hosted Piped instance and it does not 
recognize the domain. Obviously.

Thankfully, yt-dlp is an open source project so you can actually see 
what goes on behind the scenes. In my case, I installed it with the 
Arch Linux package manager and it resides at:

> /usr/lib/python3.13/site-packages/yt_dlp/

The way yt-dlp works is that it has a folder called "extractor" in 
that path and in that folder there is a python file for each 
supported website. In YouTube's case it's youtube.py. I opened it and 
I saw this:


class YoutubeBaseInfoExtractor(InfoExtractor):
    """Provide base functions for Youtube extractors"""

    _RESERVED_NAMES = (
        
r'channel|c|user|playlist|watch|w|v|embed|e|live|watch_popup|clip|'
        
r'shorts|movies|results|search|shared|hashtag|trending|explore|feed|fe
eds|'
        r'browse|oembed|get_video_info|iframe_api|s/player|source|'
        
r'storefront|oops|index|account|t/terms|about|upload|signin|logout')

    _PLAYLIST_ID_RE = 
r'(?:(?:PL|LL|EC|UU|FL|RD|UL|TL|PU|OLAK5uy_)[0-9A-Za-z-_]{10,}|RDMM|WL
|LL|LM)'

    # _NETRC_MACHINE = 'youtube'

    # If True it will raise an error if no login info is provided
    _LOGIN_REQUIRED = False

    _INVIDIOUS_SITES = (
        # invidious-redirect websites
        r'(?:www\.)?redirect\.invidious\.io',
        r'(?:(?:www|dev)\.)?invidio\.us',
        # Invidious instances taken from 
https://github.com/iv-org/documentation/blob/master/docs/instances.md
        r'(?:www\.)?invidious\.pussthecat\.org',
        r'(?:www\.)?invidious\.zee\.li',
        [more instances here]
    )


There is a class called YoutubeBaseInfoExtractor that has an array of 
instances called _INVIDIOUS_SITES that uses a regex to catch every 
domain there. Now, I saw at the GitHub page of yt-dlp a lot of people 
asking the maintainers to add more instances on this list. 
Theoretically you also can just edit the file and add a domain so 
that it recognizes that one too. But, in my personal opinion it's 
never a good idea to edit upstream files because as the program 
updates your changes will be overwritten. So I found another way to 
deal with this.

You see, yt-dlp is not just a command line utility. You can use it as 
a library to make your own extractors for websites. The way you do 
that is by creating your own plugins. In my case, I didn't actually 
want to make a new extractor but somehow extend an array of an 
already existing one. Not all extractors use this method but since 
YouTube does, it would work. So I made this file at this location:

> ~/.config/yt-dlp/plugins/piped/yt_dlp_plugins/extractor/piped.py

The contents are simple:


from yt_dlp.extractor.youtube import YoutubeBaseInfoExtractor, 
YoutubeIE

class CustomYoutubeBaseInfoExtractor(YoutubeBaseInfoExtractor):
    _INVIDIOUS_SITES = YoutubeBaseInfoExtractor._INVIDIOUS_SITES + (
        r'(?:www\.)?piped\.konsthol\.eu',
    )

class PipedKonstholYoutubeIE(YoutubeIE, 
CustomYoutubeBaseInfoExtractor):
    _VALID_URL = 
r'https?://(?:www\.)?piped\.konsthol\.eu/watch\?v=(?P<id>[0-9A-Za-z_-]
{11})'
    IE_NAME = 'piped.konsthol.eu'



We import the class that contains the array we need and the youtube 
extractor. We make a new class in which we provide the one that has 
the array. We access the array and add a new regex for our domain. 
Then we make a new class for the extractor, provide the one we just 
created and the YouTube extractor class and we tell it to work for 
urls that look like the one we provided. In that way, this pseudo 
extractor is being activated when we provide a url that looks like 
this, it extends the actual YouTube extractor and activates that one, 
only this time it works for our domain too.

It's amazing what you can do with open source software just by 
observing how a program works. Now every time someone needs a new 
domain for an alternative YouTube frontend added, instead of asking 
the developers to do that, using this simple solution he/she can just 
add it to the plugin.

yt-dlp GitHub page
https://github.com/yt-dlp/yt-dlp/



