152 lines
6.4 KiB
HTML
152 lines
6.4 KiB
HTML
<!DOCTYPE html>
|
||
<html>
|
||
<head>
|
||
<title>Konsthol</title>
|
||
<meta charset="utf-8">
|
||
<meta name="robots" content="noindex">
|
||
<link rel="alternate" type="application/atom+xml" title="RSS Feed" href="/rss.xml">
|
||
<link rel="stylesheet" href="/css/style.css" >
|
||
<link rel="shortcut icon" type="image/png" sizes="32x32" href="/images/favicon-32x32.png">
|
||
|
||
<script>
|
||
(function(d,t) {
|
||
var BASE_URL="https://chat.konsthol.eu";
|
||
var g=d.createElement(t),s=d.getElementsByTagName(t)[0];
|
||
g.src=BASE_URL+"/packs/js/sdk.js";
|
||
g.defer = true;
|
||
g.async = true;
|
||
s.parentNode.insertBefore(g,s);
|
||
window.chatwootSettings = {
|
||
darkMode: "dark"
|
||
};
|
||
g.onload=function(){
|
||
window.chatwootSDK.run({
|
||
websiteToken: 'rYqPF7TtnospKkLhtjf5LkPy',
|
||
baseUrl: BASE_URL
|
||
})
|
||
}
|
||
})(document,"script");
|
||
</script>
|
||
</head>
|
||
<body>
|
||
<section>
|
||
<blockquote>
|
||
<p>DATE: Sun 12 Jan 2025 15:51 By: konsthol@pm.me</p>
|
||
</blockquote>
|
||
<h1 id="simple-way-to-extend-yt-dlp">Simple way to extend yt-dlp</h1>
|
||
<p>
|
||
Lots of people use yt-dlp either directly or indirectly through mpv. It’s a
|
||
powerful tool that acts as a website scraper and it supports thousands of
|
||
websites. The website its mostly used for is like the name suggests YouTube.
|
||
Now, YouTube is a great resource but usage through the website is quite
|
||
unpleasant so lots of people opt out to use alternative frontends like
|
||
Invidious or Piped. Lots of times you just want to use mpv to stream a YouTube
|
||
video by providing the link like:
|
||
</p>
|
||
<blockquote>
|
||
<p>mpv https://youtube.com/watch?v=[VideoID]</p>
|
||
</blockquote>
|
||
<p>
|
||
That works like a charm, but what happens when you provide a link of an
|
||
alternative frontend? Well, it translates it to the aforementioned format in
|
||
order to work. But there are so many instances of Invidious and Piped, so how
|
||
does it know what to do? That was my question as well since I use a self
|
||
hosted Piped instance and it does not recognize the domain. Obviously.
|
||
</p>
|
||
<p>
|
||
Thankfully, yt-dlp is an open source project so you can actually see what goes
|
||
on behind the scenes. In my case, I installed it with the Arch Linux package
|
||
manager and it resides at:
|
||
</p>
|
||
<blockquote>
|
||
<p>/usr/lib/python3.13/site-packages/yt_dlp/</p>
|
||
</blockquote>
|
||
<p>
|
||
The way yt-dlp works is that it has a folder called “extractor” in that path
|
||
and in that folder there is a python file for each supported website. In
|
||
YouTube’s case it’s youtube.py. I opened it and I saw this:
|
||
</p>
|
||
<pre><code>class YoutubeBaseInfoExtractor(InfoExtractor):
|
||
"""Provide base functions for Youtube extractors"""
|
||
|
||
_RESERVED_NAMES = (
|
||
r'channel|c|user|playlist|watch|w|v|embed|e|live|watch_popup|clip|'
|
||
r'shorts|movies|results|search|shared|hashtag|trending|explore|feed|feeds|'
|
||
r'browse|oembed|get_video_info|iframe_api|s/player|source|'
|
||
r'storefront|oops|index|account|t/terms|about|upload|signin|logout')
|
||
|
||
_PLAYLIST_ID_RE = r'(?:(?:PL|LL|EC|UU|FL|RD|UL|TL|PU|OLAK5uy_)[0-9A-Za-z-_]{10,}|RDMM|WL|LL|LM)'
|
||
|
||
# _NETRC_MACHINE = 'youtube'
|
||
|
||
# If True it will raise an error if no login info is provided
|
||
_LOGIN_REQUIRED = False
|
||
|
||
_INVIDIOUS_SITES = (
|
||
# invidious-redirect websites
|
||
r'(?:www\.)?redirect\.invidious\.io',
|
||
r'(?:(?:www|dev)\.)?invidio\.us',
|
||
# Invidious instances taken from https://github.com/iv-org/documentation/blob/master/docs/instances.md
|
||
r'(?:www\.)?invidious\.pussthecat\.org',
|
||
r'(?:www\.)?invidious\.zee\.li',
|
||
[more instances here]
|
||
)</code></pre>
|
||
<p>
|
||
There is a class called YoutubeBaseInfoExtractor that has an array of
|
||
instances called _INVIDIOUS_SITES that uses a regex to catch every domain
|
||
there. Now, I saw at the GitHub page of yt-dlp a lot of people asking the
|
||
maintainers to add more instances on this list. Theoretically you also can
|
||
just edit the file and add a domain so that it recognizes that one too. But,
|
||
in my personal opinion it’s never a good idea to edit upstream files because
|
||
as the program updates your changes will be overwritten. So I found another
|
||
way to deal with this.
|
||
</p>
|
||
<p>
|
||
You see, yt-dlp is not just a command line utility. You can use it as a
|
||
library to make your own extractors for websites. The way you do that is by
|
||
creating your own plugins. In my case, I didn’t actually want to make a new
|
||
extractor but somehow extend an array of an already existing one. Not all
|
||
extractors use this method but since YouTube does, it would work. So I made
|
||
this file at this location:
|
||
</p>
|
||
<blockquote>
|
||
<p>~/.config/yt-dlp/plugins/piped/yt_dlp_plugins/extractor/piped.py</p>
|
||
</blockquote>
|
||
<p>The contents are simple:</p>
|
||
<pre><code>from yt_dlp.extractor.youtube import YoutubeBaseInfoExtractor, YoutubeIE
|
||
|
||
class CustomYoutubeBaseInfoExtractor(YoutubeBaseInfoExtractor):
|
||
_INVIDIOUS_SITES = YoutubeBaseInfoExtractor._INVIDIOUS_SITES + (
|
||
r'(?:www\.)?piped\.konsthol\.eu',
|
||
)
|
||
|
||
class PipedKonstholYoutubeIE(YoutubeIE, CustomYoutubeBaseInfoExtractor):
|
||
_VALID_URL = r'https?://(?:www\.)?piped\.konsthol\.eu/watch\?v=(?P<id>[0-9A-Za-z_-]{11})'
|
||
IE_NAME = 'piped.konsthol.eu'
|
||
</code></pre>
|
||
<p>
|
||
We import the class that contains the array we need and the youtube extractor.
|
||
We make a new class in which we provide the one that has the array. We access
|
||
the array and add a new regex for our domain. Then we make a new class for the
|
||
extractor, provide the one we just created and the YouTube extractor class and
|
||
we tell it to work for urls that look like the one we provided. In that way,
|
||
this pseudo extractor is being activated when we provide a url that looks like
|
||
this, it extends the actual YouTube extractor and activates that one, only
|
||
this time it works for our domain too.
|
||
</p>
|
||
<p>
|
||
It’s amazing what you can do with open source software just by observing how a
|
||
program works. Now every time someone needs a new domain for an alternative
|
||
YouTube frontend added, instead of asking the developers to do that, using
|
||
this simple solution he/she can just add it to the plugin.
|
||
</p>
|
||
<p><a href="https://github.com/yt-dlp/yt-dlp/">yt-dlp GitHub page</a><br /></p>
|
||
<p><a href="..">..</a></p>
|
||
<footer>
|
||
<a id="gemyo" href="gemini://konsthol.eu/"><img src="/images/best_viewed_on_gemini.png" /><br /></a>
|
||
</footer>
|
||
|
||
</section>
|
||
</body>
|
||
</html>
|