Derp. Fix missing constant post rename

2025-04-19 20:48:19 +03:00
parent 0c05420c3b
commit 094466ec5a
11 changed files with 516 additions and 19 deletions
--- a/rss.xml
+++ b/rss.xml
@@ -6,6 +6,125 @@
 <link>https://konsthol.eu/rss.xml</link>
 <atom:link href="https://konsthol.eu/rss.xml" rel="self" type="application/rss+xml"/>

+<item>
+<title>Simple way to extend yt-dlp</title>
+<link>https://konsthol.eu/log/simple_way_to_extend_yt_dlp-12-01-2025.html</link>
+<pubDate>Sun, 12 Jan 2025</pubDate>
+<description><![CDATA[<blockquote>
+  <p>DATE: Sun 12 Jan 2025 15:51 By: konsthol@pm.me</p>
+</blockquote>
+<h1 id="simple-way-to-extend-yt-dlp">Simple way to extend yt-dlp</h1>
+<p>
+  Lots of people use yt-dlp either directly or indirectly through mpv. It’s a
+  powerful tool that acts as a website scraper and it supports thousands of
+  websites. The website its mostly used for is like the name suggests YouTube.
+  Now, YouTube is a great resource but usage through the website is quite
+  unpleasant so lots of people opt out to use alternative frontends like
+  Invidious or Piped. Lots of times you just want to use mpv to stream a YouTube
+  video by providing the link like:
+</p>
+<blockquote>
+  <p>mpv https://youtube.com/watch?v=[VideoID]</p>
+</blockquote>
+<p>
+  That works like a charm, but what happens when you provide a link of an
+  alternative frontend? Well, it translates it to the aforementioned format in
+  order to work. But there are so many instances of Invidious and Piped, so how
+  does it know what to do? That was my question as well since I use a self
+  hosted Piped instance and it does not recognize the domain. Obviously.
+</p>
+<p>
+  Thankfully, yt-dlp is an open source project so you can actually see what goes
+  on behind the scenes. In my case, I installed it with the Arch Linux package
+  manager and it resides at:
+</p>
+<blockquote>
+  <p>/usr/lib/python3.13/site-packages/yt_dlp/</p>
+</blockquote>
+<p>
+  The way yt-dlp works is that it has a folder called “extractor” in that path
+  and in that folder there is a python file for each supported website. In
+  YouTube’s case it’s youtube.py. I opened it and I saw this:
+</p>
+<pre><code>class YoutubeBaseInfoExtractor(InfoExtractor):
+    &quot;&quot;&quot;Provide base functions for Youtube extractors&quot;&quot;&quot;
+
+    _RESERVED_NAMES = (
+        r&#39;channel|c|user|playlist|watch|w|v|embed|e|live|watch_popup|clip|&#39;
+        r&#39;shorts|movies|results|search|shared|hashtag|trending|explore|feed|feeds|&#39;
+        r&#39;browse|oembed|get_video_info|iframe_api|s/player|source|&#39;
+        r&#39;storefront|oops|index|account|t/terms|about|upload|signin|logout&#39;)
+
+    _PLAYLIST_ID_RE = r&#39;(?:(?:PL|LL|EC|UU|FL|RD|UL|TL|PU|OLAK5uy_)[0-9A-Za-z-_]{10,}|RDMM|WL|LL|LM)&#39;
+
+    # _NETRC_MACHINE = &#39;youtube&#39;
+
+    # If True it will raise an error if no login info is provided
+    _LOGIN_REQUIRED = False
+
+    _INVIDIOUS_SITES = (
+        # invidious-redirect websites
+        r&#39;(?:www\.)?redirect\.invidious\.io&#39;,
+        r&#39;(?:(?:www|dev)\.)?invidio\.us&#39;,
+        # Invidious instances taken from https://github.com/iv-org/documentation/blob/master/docs/instances.md
+        r&#39;(?:www\.)?invidious\.pussthecat\.org&#39;,
+        r&#39;(?:www\.)?invidious\.zee\.li&#39;,
+        [more instances here]
+    )</code></pre>
+<p>
+  There is a class called YoutubeBaseInfoExtractor that has an array of
+  instances called _INVIDIOUS_SITES that uses a regex to catch every domain
+  there. Now, I saw at the GitHub page of yt-dlp a lot of people asking the
+  maintainers to add more instances on this list. Theoretically you also can
+  just edit the file and add a domain so that it recognizes that one too. But,
+  in my personal opinion it’s never a good idea to edit upstream files because
+  as the program updates your changes will be overwritten. So I found another
+  way to deal with this.
+</p>
+<p>
+  You see, yt-dlp is not just a command line utility. You can use it as a
+  library to make your own extractors for websites. The way you do that is by
+  creating your own plugins. In my case, I didn’t actually want to make a new
+  extractor but somehow extend an array of an already existing one. Not all
+  extractors use this method but since YouTube does, it would work. So I made
+  this file at this location:
+</p>
+<blockquote>
+  <p>~/.config/yt-dlp/plugins/piped/yt_dlp_plugins/extractor/piped.py</p>
+</blockquote>
+<p>The contents are simple:</p>
+<pre><code>from yt_dlp.extractor.youtube import YoutubeBaseInfoExtractor, YoutubeIE
+
+class CustomYoutubeBaseInfoExtractor(YoutubeBaseInfoExtractor):
+    _INVIDIOUS_SITES = YoutubeBaseInfoExtractor._INVIDIOUS_SITES + (
+        r&#39;(?:www\.)?piped\.konsthol\.eu&#39;,
+    )
+
+class PipedKonstholYoutubeIE(YoutubeIE, CustomYoutubeBaseInfoExtractor):
+    _VALID_URL = r&#39;https?://(?:www\.)?piped\.konsthol\.eu/watch\?v=(?P&lt;id&gt;[0-9A-Za-z_-]{11})&#39;
+    IE_NAME = &#39;piped.konsthol.eu&#39;
+</code></pre>
+<p>
+  We import the class that contains the array we need and the youtube extractor.
+  We make a new class in which we provide the one that has the array. We access
+  the array and add a new regex for our domain. Then we make a new class for the
+  extractor, provide the one we just created and the YouTube extractor class and
+  we tell it to work for urls that look like the one we provided. In that way,
+  this pseudo extractor is being activated when we provide a url that looks like
+  this, it extends the actual YouTube extractor and activates that one, only
+  this time it works for our domain too.
+</p>
+<p>
+  It’s amazing what you can do with open source software just by observing how a
+  program works. Now every time someone needs a new domain for an alternative
+  YouTube frontend added, instead of asking the developers to do that, using
+  this simple solution he/she can just add it to the plugin.
+</p>
+<p><a href="https://github.com/yt-dlp/yt-dlp/">yt-dlp GitHub page</a><br /></p>
+<p><a href="..">..</a></p>]]></description>
+</item>
+
+
 <item>
 <title>The magic of Wake-On-LAN</title>
 <link>https://konsthol.eu/log/the_magic_of_wake_on_lan-19-12-2024.html</link>