geminiCapsule/log/simple_way_to_extend_yt_dlp-12-01-2025.html

<!DOCTYPE html>
<html>
<head>
	<title>Konsthol</title>
	<meta charset="utf-8">
	<meta name="robots" content="noindex">
	<link rel="alternate" type="application/atom+xml" title="RSS Feed" href="/rss.xml">
	<link rel="stylesheet" href="/css/style.css" >
	<link rel="shortcut icon" type="image/png" sizes="32x32" href="/images/favicon-32x32.png">

<script>
      (function(d,t) {
        var BASE_URL="https://chat.konsthol.eu";
        var g=d.createElement(t),s=d.getElementsByTagName(t)[0];
        g.src=BASE_URL+"/packs/js/sdk.js";
        g.defer = true;
        g.async = true;
        s.parentNode.insertBefore(g,s);
	window.chatwootSettings = {
	  darkMode: "dark"
	};
        g.onload=function(){
          window.chatwootSDK.run({
            websiteToken: 'rYqPF7TtnospKkLhtjf5LkPy',
            baseUrl: BASE_URL
          })
        }
      })(document,"script");
</script>
</head>
<body>
<section>
<blockquote>
  <p>DATE: Sun 12 Jan 2025 15:51 By: konsthol@pm.me</p>
</blockquote>
<h1 id="simple-way-to-extend-yt-dlp">Simple way to extend yt-dlp</h1>
<p>
  Lots of people use yt-dlp either directly or indirectly through mpv. It’s a
  powerful tool that acts as a website scraper and it supports thousands of
  websites. The website its mostly used for is like the name suggests YouTube.
  Now, YouTube is a great resource but usage through the website is quite
  unpleasant so lots of people opt out to use alternative frontends like
  Invidious or Piped. Lots of times you just want to use mpv to stream a YouTube
  video by providing the link like:
</p>
<blockquote>
  <p>mpv https://youtube.com/watch?v=[VideoID]</p>
</blockquote>
<p>
  That works like a charm, but what happens when you provide a link of an
  alternative frontend? Well, it translates it to the aforementioned format in
  order to work. But there are so many instances of Invidious and Piped, so how
  does it know what to do? That was my question as well since I use a self
  hosted Piped instance and it does not recognize the domain. Obviously.
</p>
<p>
  Thankfully, yt-dlp is an open source project so you can actually see what goes
  on behind the scenes. In my case, I installed it with the Arch Linux package
  manager and it resides at:
</p>
<blockquote>
  <p>/usr/lib/python3.13/site-packages/yt_dlp/</p>
</blockquote>
<p>
  The way yt-dlp works is that it has a folder called “extractor” in that path
  and in that folder there is a python file for each supported website. In
  YouTube’s case it’s youtube.py. I opened it and I saw this:
</p>
<pre><code>class YoutubeBaseInfoExtractor(InfoExtractor):
    &quot;&quot;&quot;Provide base functions for Youtube extractors&quot;&quot;&quot;

    _RESERVED_NAMES = (
        r&#39;channel|c|user|playlist|watch|w|v|embed|e|live|watch_popup|clip|&#39;
        r&#39;shorts|movies|results|search|shared|hashtag|trending|explore|feed|feeds|&#39;
        r&#39;browse|oembed|get_video_info|iframe_api|s/player|source|&#39;
        r&#39;storefront|oops|index|account|t/terms|about|upload|signin|logout&#39;)

    _PLAYLIST_ID_RE = r&#39;(?:(?:PL|LL|EC|UU|FL|RD|UL|TL|PU|OLAK5uy_)[0-9A-Za-z-_]{10,}|RDMM|WL|LL|LM)&#39;

    # _NETRC_MACHINE = &#39;youtube&#39;

    # If True it will raise an error if no login info is provided
    _LOGIN_REQUIRED = False

    _INVIDIOUS_SITES = (
        # invidious-redirect websites
        r&#39;(?:www\.)?redirect\.invidious\.io&#39;,
        r&#39;(?:(?:www|dev)\.)?invidio\.us&#39;,
        # Invidious instances taken from https://github.com/iv-org/documentation/blob/master/docs/instances.md
        r&#39;(?:www\.)?invidious\.pussthecat\.org&#39;,
        r&#39;(?:www\.)?invidious\.zee\.li&#39;,
        [more instances here]
    )</code></pre>
<p>
  There is a class called YoutubeBaseInfoExtractor that has an array of
  instances called _INVIDIOUS_SITES that uses a regex to catch every domain
  there. Now, I saw at the GitHub page of yt-dlp a lot of people asking the
  maintainers to add more instances on this list. Theoretically you also can
  just edit the file and add a domain so that it recognizes that one too. But,
  in my personal opinion it’s never a good idea to edit upstream files because
  as the program updates your changes will be overwritten. So I found another
  way to deal with this.
</p>
<p>
  You see, yt-dlp is not just a command line utility. You can use it as a
  library to make your own extractors for websites. The way you do that is by
  creating your own plugins. In my case, I didn’t actually want to make a new
  extractor but somehow extend an array of an already existing one. Not all
  extractors use this method but since YouTube does, it would work. So I made
  this file at this location:
</p>
<blockquote>
  <p>~/.config/yt-dlp/plugins/piped/yt_dlp_plugins/extractor/piped.py</p>
</blockquote>
<p>The contents are simple:</p>
<pre><code>from yt_dlp.extractor.youtube import YoutubeBaseInfoExtractor, YoutubeIE

class CustomYoutubeBaseInfoExtractor(YoutubeBaseInfoExtractor):
    _INVIDIOUS_SITES = YoutubeBaseInfoExtractor._INVIDIOUS_SITES + (
        r&#39;(?:www\.)?piped\.konsthol\.eu&#39;,
    )

class PipedKonstholYoutubeIE(YoutubeIE, CustomYoutubeBaseInfoExtractor):
    _VALID_URL = r&#39;https?://(?:www\.)?piped\.konsthol\.eu/watch\?v=(?P&lt;id&gt;[0-9A-Za-z_-]{11})&#39;
    IE_NAME = &#39;piped.konsthol.eu&#39;
</code></pre>
<p>
  We import the class that contains the array we need and the youtube extractor.
  We make a new class in which we provide the one that has the array. We access
  the array and add a new regex for our domain. Then we make a new class for the
  extractor, provide the one we just created and the YouTube extractor class and
  we tell it to work for urls that look like the one we provided. In that way,
  this pseudo extractor is being activated when we provide a url that looks like
  this, it extends the actual YouTube extractor and activates that one, only
  this time it works for our domain too.
</p>
<p>
  It’s amazing what you can do with open source software just by observing how a
  program works. Now every time someone needs a new domain for an alternative
  YouTube frontend added, instead of asking the developers to do that, using
  this simple solution he/she can just add it to the plugin.
</p>
<p><a href="https://github.com/yt-dlp/yt-dlp/">yt-dlp GitHub page</a><br /></p>
<p><a href="..">..</a></p>
<footer>
	<a id="gemyo" href="gemini://konsthol.eu/"><img src="/images/best_viewed_on_gemini.png" /><br /></a>
</footer>

</section>
</body>
</html>