Compare commits

...

9 Commits

Author SHA1 Message Date
pukkandan
08d30158ec
[cleanup, docs] Misc cleanup
Closes #2828, closes #2734, closes #2802, closes #2937
2022-03-08 22:38:06 +05:30
Ha Tien Loi
c89bec262c
[xinpianchang] Add extractor (#2963)
Authored by: hatienl0i261299
2022-03-08 08:55:40 -08:00
Ha Tien Loi
151f8f1c02
[fptplay] Add extractor (#2949)
Closes #2857
Authored by: hatienl0i261299
2022-03-08 08:52:51 -08:00
Max Mehl
a35155be17
[peertube] Add media.fsfe.org (#2986)
Authored by: mxmehl
2022-03-08 08:48:35 -08:00
nyuszika7h
e66662b1e0
[ccma] Fix timestamp parsing (#2989)
Authored by: nyuszika7h
2022-03-08 08:45:23 -08:00
coletdev
4390d5ec12
Add brotli content-encoding support (#2433)
Authored by: coletdjnz
2022-03-08 08:44:05 -08:00
CplPwnies
9e0e6adb2d
[adobepass] Add Suddenlink MSO (#2977)
Closes #2704
Authored by: CplPwnies
2022-03-08 08:18:52 -08:00
Lesmiscore
b637c4e22e
[mildom] Fix linter 2022-03-08 23:56:30 +09:00
Lesmiscore (Naoya Ozaki)
fb6e3f4389
[mildom] Rework extractors (#2940)
Authored by: Lesmiscore
2022-03-08 23:49:10 +09:00
30 changed files with 537 additions and 259 deletions

2
.gitignore vendored
View File

@ -24,6 +24,7 @@ cookies
*.3gp
*.ape
*.ass
*.avi
*.desktop
*.flac
@ -106,6 +107,7 @@ yt-dlp.zip
*.iml
.vscode
*.sublime-*
*.code-workspace
# Lazy extractors
*/extractor/lazy_extractors.py

View File

@ -11,6 +11,7 @@
- [Is anyone going to need the feature?](#is-anyone-going-to-need-the-feature)
- [Is your question about yt-dlp?](#is-your-question-about-yt-dlp)
- [Are you willing to share account details if needed?](#are-you-willing-to-share-account-details-if-needed)
- [Is the website primarily used for piracy](#is-the-website-primarily-used-for-piracy)
- [DEVELOPER INSTRUCTIONS](#developer-instructions)
- [Adding new feature or making overarching changes](#adding-new-feature-or-making-overarching-changes)
- [Adding support for a new site](#adding-support-for-a-new-site)
@ -24,6 +25,7 @@
- [Collapse fallbacks](#collapse-fallbacks)
- [Trailing parentheses](#trailing-parentheses)
- [Use convenience conversion and parsing functions](#use-convenience-conversion-and-parsing-functions)
- [My pull request is labeled pending-fixes](#my-pull-request-is-labeled-pending-fixes)
- [EMBEDDING YT-DLP](README.md#embedding-yt-dlp)
@ -123,6 +125,10 @@ While these steps won't necessarily ensure that no misuse of the account takes p
- Change the password before sharing the account to something random (use [this](https://passwordsgenerator.net/) if you don't have a random password generator).
- Change the password after receiving the account back.
### Is the website primarily used for piracy?
We follow [youtube-dl's policy](https://github.com/ytdl-org/youtube-dl#can-you-add-support-for-this-anime-video-site-or-site-which-shows-current-movies-for-free) to not support services that is primarily used for infringing copyright. Additionally, it has been decided to not to support porn sites that specialize in deep fake. We also cannot support any service that serves only [DRM protected content](https://en.wikipedia.org/wiki/Digital_rights_management).
@ -210,7 +216,7 @@ After you have ensured this site is distributing its content legally, you can fo
}
```
1. Add an import in [`yt_dlp/extractor/extractors.py`](yt_dlp/extractor/extractors.py).
1. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, the tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc. Note that tests with `only_matching` key in test's dict are not counted in. You can also run all the tests in one go with `TestDownload.test_YourExtractor_all`
1. Run `python test/test_download.py TestDownload.test_YourExtractor` (note that `YourExtractor` doesn't end with `IE`). This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, the tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc. Note that tests with `only_matching` key in test's dict are not counted in. You can also run all the tests in one go with `TestDownload.test_YourExtractor_all`
1. Make sure you have atleast one test for your extractor. Even if all videos covered by the extractor are expected to be inaccessible for automated testing, tests should still be added with a `skip` parameter indicating why the particular test is disabled from running.
1. Have a look at [`yt_dlp/extractor/common.py`](yt_dlp/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](yt_dlp/extractor/common.py#L91-L426). Add tests and code for as many as you want.
1. Make sure your code follows [yt-dlp coding conventions](#yt-dlp-coding-conventions) and check the code with [flake8](https://flake8.pycqa.org/en/latest/index.html#quickstart):
@ -658,6 +664,10 @@ duration = float_or_none(video.get('durationMs'), scale=1000)
view_count = int_or_none(video.get('views'))
```
# My pull request is labeled pending-fixes
The `pending-fixes` label is added when there are changes requested to a PR. When the necessary changes are made, the label should be removed. However, despite our best efforts, it may sometimes happen that the maintainer did not see the changes or forgot to remove the label. If your PR is still marked as `pending-fixes` a few days after all requested changes have been made, feel free to ping the maintainer who labeled your issue and ask them to re-review and remove the label.

View File

@ -146,7 +146,7 @@ chio0hai
cntrl-s
Deer-Spangle
DEvmIb
Grabien
Grabien/MaximVol
j54vc1bk
mpeter50
mrpapersonic
@ -160,7 +160,7 @@ PilzAdam
zmousm
iw0nderhow
unit193
TwoThousandHedgehogs
TwoThousandHedgehogs/KathrynElrod
Jertzukka
cypheron
Hyeeji

View File

@ -16,7 +16,7 @@ pypi-files: AUTHORS Changelog.md LICENSE README.md README.txt supportedsites com
clean-test:
rm -rf test/testdata/sigs/player-*.js tmp/ *.annotations.xml *.aria2 *.description *.dump *.frag \
*.frag.aria2 *.frag.urls *.info.json *.live_chat.json *.meta *.part* *.tmp *.temp *.unknown_video *.ytdl \
*.3gp *.ape *.avi *.desktop *.flac *.flv *.jpeg *.jpg *.m4a *.m4v *.mhtml *.mkv *.mov *.mp3 \
*.3gp *.ape *.ass *.avi *.desktop *.flac *.flv *.jpeg *.jpg *.m4a *.m4v *.mhtml *.mkv *.mov *.mp3 \
*.mp4 *.ogg *.opus *.png *.sbv *.srt *.swf *.swp *.ttml *.url *.vtt *.wav *.webloc *.webm *.webp
clean-dist:
rm -rf yt-dlp.1.temp.md yt-dlp.1 README.txt MANIFEST build/ dist/ .coverage cover/ yt-dlp.tar.gz completions/ \

View File

@ -112,7 +112,7 @@ yt-dlp is a [youtube-dl](https://github.com/ytdl-org/youtube-dl) fork based on t
* **Other new options**: Many new options have been added such as `--concat-playlist`, `--print`, `--wait-for-video`, `--sleep-requests`, `--convert-thumbnails`, `--write-link`, `--force-download-archive`, `--force-overwrites`, `--break-on-reject` etc
* **Improvements**: Regex and other operators in `--match-filter`, multiple `--postprocessor-args` and `--downloader-args`, faster archive checking, more [format selection options](#format-selection), merge multi-video/audio, multiple `--config-locations`, `--exec` at different stages, etc
* **Improvements**: Regex and other operators in `--format`/`--match-filter`, multiple `--postprocessor-args` and `--downloader-args`, faster archive checking, more [format selection options](#format-selection), merge multi-video/audio, multiple `--config-locations`, `--exec` at different stages, etc
* **Plugins**: Extractors and PostProcessors can be loaded from an external file. See [plugins](#plugins) for details
@ -130,7 +130,7 @@ Some of yt-dlp's default options are different from that of youtube-dl and youtu
* The default [format sorting](#sorting-formats) is different from youtube-dl and prefers higher resolution and better codecs rather than higher bitrates. You can use the `--format-sort` option to change this to any order you prefer, or use `--compat-options format-sort` to use youtube-dl's sorting order
* The default format selector is `bv*+ba/b`. This means that if a combined video + audio format that is better than the best video-only format is found, the former will be preferred. Use `-f bv+ba/b` or `--compat-options format-spec` to revert this
* Unlike youtube-dlc, yt-dlp does not allow merging multiple audio/video streams into one file by default (since this conflicts with the use of `-f bv*+ba`). If needed, this feature must be enabled using `--audio-multistreams` and `--video-multistreams`. You can also use `--compat-options multistreams` to enable both
* `--ignore-errors` is enabled by default. Use `--abort-on-error` or `--compat-options abort-on-error` to abort on errors instead
* `--no-abort-on-error` is enabled by default. Use `--abort-on-error` or `--compat-options abort-on-error` to abort on errors instead
* When writing metadata files such as thumbnails, description or infojson, the same information (if available) is also written for playlists. Use `--no-write-playlist-metafiles` or `--compat-options no-playlist-metafiles` to not write these files
* `--add-metadata` attaches the `infojson` to `mkv` files in addition to writing the metadata when used with `--write-info-json`. Use `--no-embed-info-json` or `--compat-options no-attach-info-json` to revert this
* Some metadata are embedded into different fields when using `--add-metadata` as compared to youtube-dl. Most notably, `comment` field contains the `webpage_url` and `synopsis` contains the `description`. You can [use `--parse-metadata`](#modifying-metadata) to modify this to your liking or use `--compat-options embed-metadata` to revert this
@ -267,7 +267,8 @@ While all the other dependencies are optional, `ffmpeg` and `ffprobe` are highly
* [**pycryptodomex**](https://github.com/Legrandin/pycryptodome) - For decrypting AES-128 HLS streams and various other data. Licensed under [BSD2](https://github.com/Legrandin/pycryptodome/blob/master/LICENSE.rst)
* [**websockets**](https://github.com/aaugustin/websockets) - For downloading over websocket. Licensed under [BSD3](https://github.com/aaugustin/websockets/blob/main/LICENSE)
* [**secretstorage**](https://github.com/mitya57/secretstorage) - For accessing the Gnome keyring while decrypting cookies of Chromium-based browsers on Linux. Licensed under [BSD](https://github.com/mitya57/secretstorage/blob/master/LICENSE)
* [**AtomicParsley**](https://github.com/wez/atomicparsley) - For embedding thumbnail in mp4/m4a if mutagen is not present. Licensed under [GPLv2+](https://github.com/wez/atomicparsley/blob/master/COPYING)
* [**AtomicParsley**](https://github.com/wez/atomicparsley) - For embedding thumbnail in mp4/m4a if mutagen/ffmpeg cannot. Licensed under [GPLv2+](https://github.com/wez/atomicparsley/blob/master/COPYING)
* [**brotli**](https://github.com/google/brotli) or [**brotlicffi**](https://github.com/python-hyper/brotlicffi) - [Brotli](https://en.wikipedia.org/wiki/Brotli) content encoding support. Both licensed under MIT <sup>[1](https://github.com/google/brotli/blob/master/LICENSE) [2](https://github.com/python-hyper/brotlicffi/blob/master/LICENSE) </sup>
* [**rtmpdump**](http://rtmpdump.mplayerhq.hu) - For downloading `rtmp` streams. ffmpeg will be used as a fallback. Licensed under [GPLv2+](http://rtmpdump.mplayerhq.hu)
* [**mplayer**](http://mplayerhq.hu/design7/info.html) or [**mpv**](https://mpv.io) - For downloading `rstp` streams. ffmpeg will be used as a fallback. Licensed under [GPLv2+](https://github.com/mpv-player/mpv/blob/master/Copyright)
* [**phantomjs**](https://github.com/ariya/phantomjs) - Used in extractors where javascript needs to be run. Licensed under [BSD3](https://github.com/ariya/phantomjs/blob/master/LICENSE.BSD)
@ -278,13 +279,14 @@ To use or redistribute the dependencies, you must agree to their respective lice
The Windows and MacOS standalone release binaries are already built with the python interpreter, mutagen, pycryptodomex and websockets included.
<!-- TODO: ffmpeg has merged this patch. Remove this note once there is new release -->
**Note**: There are some regressions in newer ffmpeg versions that causes various issues when used alongside yt-dlp. Since ffmpeg is such an important dependency, we provide [custom builds](https://github.com/yt-dlp/FFmpeg-Builds#ffmpeg-static-auto-builds) with patches for these issues at [yt-dlp/FFmpeg-Builds](https://github.com/yt-dlp/FFmpeg-Builds). See [the readme](https://github.com/yt-dlp/FFmpeg-Builds#patches-applied) for details on the specific issues solved by these builds
## COMPILE
**For Windows**:
To build the Windows executable, you must have pyinstaller (and optionally mutagen, pycryptodomex, websockets). Once you have all the necessary dependencies installed, (optionally) build lazy extractors using `devscripts/make_lazy_extractors.py`, and then just run `pyinst.py`. The executable will be built for the same architecture (32/64 bit) as the python used to build it.
To build the Windows executable, you must have pyinstaller (and any of yt-dlp's optional dependencies if needed). Once you have all the necessary dependencies installed, (optionally) build lazy extractors using `devscripts/make_lazy_extractors.py`, and then just run `pyinst.py`. The executable will be built for the same architecture (32/64 bit) as the python used to build it.
py -m pip install -U pyinstaller -r requirements.txt
py devscripts/make_lazy_extractors.py
@ -605,11 +607,11 @@ You can also fork the project on github and run your fork's [build workflow](.gi
--write-description etc. (default)
--no-write-playlist-metafiles Do not write playlist metadata when using
--write-info-json, --write-description etc.
--clean-infojson Remove some private fields such as
--clean-info-json Remove some private fields such as
filenames from the infojson. Note that it
could still contain some personal
information (default)
--no-clean-infojson Write all fields to the infojson
--no-clean-info-json Write all fields to the infojson
--write-comments Retrieve video comments to be placed in the
infojson. The comments are fetched even
without this option if the extraction is
@ -1598,25 +1600,28 @@ This option also has a few special uses:
* You can download an additional URL based on the metadata of the currently downloaded video. To do this, set the field `additional_urls` to the URL that you want to download. Eg: `--parse-metadata "description:(?P<additional_urls>https?://www\.vimeo\.com/\d+)` will download the first vimeo video found in the description
* You can use this to change the metadata that is embedded in the media file. To do this, set the value of the corresponding field with a `meta_` prefix. For example, any value you set to `meta_description` field will be added to the `description` field in the file. For example, you can use this to set a different "description" and "synopsis". To modify the metadata of individual streams, use the `meta<n>_` prefix (Eg: `meta1_language`). Any value set to the `meta_` field will overwrite all default values.
**Note**: Metadata modification happens before format selection, post-extraction and other post-processing operations. Some fields may be added or changed during these steps, overriding your changes.
For reference, these are the fields yt-dlp adds by default to the file metadata:
Metadata fields|From
:---|:---
`title`|`track` or `title`
`date`|`upload_date`
`description`, `synopsis`|`description`
`purl`, `comment`|`webpage_url`
`track`|`track_number`
`artist`|`artist`, `creator`, `uploader` or `uploader_id`
`genre`|`genre`
`album`|`album`
`album_artist`|`album_artist`
`disc`|`disc_number`
`show`|`series`
`season_number`|`season_number`
`episode_id`|`episode` or `episode_id`
`episode_sort`|`episode_number`
`language` of each stream|From the format's `language`
Metadata fields | From
:--------------------------|:------------------------------------------------
`title` | `track` or `title`
`date` | `upload_date`
`description`, `synopsis` | `description`
`purl`, `comment` | `webpage_url`
`track` | `track_number`
`artist` | `artist`, `creator`, `uploader` or `uploader_id`
`genre` | `genre`
`album` | `album`
`album_artist` | `album_artist`
`disc` | `disc_number`
`show` | `series`
`season_number` | `season_number`
`episode_id` | `episode` or `episode_id`
`episode_sort` | `episode_number`
`language` of each stream | the format's `language`
**Note**: The file format may not support some of these fields
@ -1815,12 +1820,11 @@ ydl_opts = {
}],
'logger': MyLogger(),
'progress_hooks': [my_hook],
# Add custom headers
'http_headers': {'Referer': 'https://www.google.com'}
}
# Add custom headers
yt_dlp.utils.std_headers.update({'Referer': 'https://www.google.com'})
# See the public functions in yt_dlp.YoutubeDL for for other available functions.
# Eg: "ydl.download", "ydl.download_with_info_file"
with yt_dlp.YoutubeDL(ydl_opts) as ydl:

View File

@ -75,7 +75,11 @@ def filter_options(readme):
section = re.search(r'(?sm)^# USAGE AND OPTIONS\n.+?(?=^# )', readme).group(0)
options = '# OPTIONS\n'
for line in section.split('\n')[1:]:
mobj = re.fullmatch(r'\s{4}(?P<opt>-(?:,\s|[^\s])+)(?:\s(?P<meta>([^\s]|\s(?!\s))+))?(\s{2,}(?P<desc>.+))?', line)
mobj = re.fullmatch(r'''(?x)
\s{4}(?P<opt>-(?:,\s|[^\s])+)
(?:\s(?P<meta>(?:[^\s]|\s(?!\s))+))?
(\s{2,}(?P<desc>.+))?
''', line)
if not mobj:
options += f'{line.lstrip()}\n'
continue

View File

@ -74,7 +74,7 @@ def version_to_list(version):
def dependency_options():
dependencies = [pycryptodome_module(), 'mutagen'] + collect_submodules('websockets')
dependencies = [pycryptodome_module(), 'mutagen', 'brotli'] + collect_submodules('websockets')
excluded_modules = ['test', 'ytdlp_plugins', 'youtube-dl', 'youtube-dlc']
yield from (f'--hidden-import={module}' for module in dependencies)

View File

@ -1,3 +1,5 @@
mutagen
pycryptodomex
websockets
brotli; platform_python_implementation=='CPython'
brotlicffi; platform_python_implementation!='CPython'

View File

@ -21,9 +21,9 @@ DESCRIPTION = 'A youtube-dl fork with additional features and patches'
LONG_DESCRIPTION = '\n\n'.join((
'Official repository: <https://github.com/yt-dlp/yt-dlp>',
'**PS**: Some links in this document will not work since this is a copy of the README.md from Github',
open('README.md', 'r', encoding='utf-8').read()))
open('README.md').read()))
REQUIREMENTS = ['mutagen', 'pycryptodomex', 'websockets']
REQUIREMENTS = open('requirements.txt').read().splitlines()
if sys.argv[1:2] == ['py2exe']:

View File

@ -32,6 +32,7 @@ from string import ascii_letters
from .compat import (
compat_basestring,
compat_brotli,
compat_get_terminal_size,
compat_kwargs,
compat_numeric_types,
@ -234,6 +235,8 @@ class YoutubeDL(object):
See "Sorting Formats" for more details.
format_sort_force: Force the given format_sort. see "Sorting Formats"
for more details.
prefer_free_formats: Whether to prefer video formats with free containers
over non-free ones of same quality.
allow_multiple_video_streams: Allow multiple video streams to be merged
into a single file
allow_multiple_audio_streams: Allow multiple audio streams to be merged
@ -3675,6 +3678,7 @@ class YoutubeDL(object):
from .cookies import SQLITE_AVAILABLE, SECRETSTORAGE_AVAILABLE
lib_str = join_nonempty(
compat_brotli and compat_brotli.__name__,
compat_pycrypto_AES and compat_pycrypto_AES.__name__.split('.')[0],
SECRETSTORAGE_AVAILABLE and 'secretstorage',
has_mutagen and 'mutagen',

View File

@ -170,6 +170,13 @@ except ImportError:
except ImportError:
compat_pycrypto_AES = None
try:
import brotlicffi as compat_brotli
except ImportError:
try:
import brotli as compat_brotli
except ImportError:
compat_brotli = None
WINDOWS_VT_MODE = False if compat_os_name == 'nt' else None
@ -258,6 +265,7 @@ __all__ = [
'compat_asyncio_run',
'compat_b64decode',
'compat_basestring',
'compat_brotli',
'compat_chr',
'compat_collections_abc',
'compat_cookiejar',

View File

@ -22,6 +22,9 @@ class YoutubeLiveChatFD(FragmentFD):
def real_download(self, filename, info_dict):
video_id = info_dict['video_id']
self.to_screen('[%s] Downloading live chat' % self.FD_NAME)
if not self.params.get('skip_download'):
self.report_warning('Live chat download runs until the livestream ends. '
'If you wish to download the video simultaneously, run a separate yt-dlp instance')
fragment_retries = self.params.get('fragment_retries', 0)
test = self.params.get('test', False)

View File

@ -8,10 +8,6 @@ import struct
from base64 import urlsafe_b64encode
from binascii import unhexlify
import typing
if typing.TYPE_CHECKING:
from ..YoutubeDL import YoutubeDL
from .common import InfoExtractor
from ..aes import aes_ecb_decrypt
from ..compat import (
@ -36,15 +32,15 @@ from ..utils import (
# NOTE: network handler related code is temporary thing until network stack overhaul PRs are merged (#2861/#2862)
def add_opener(self: 'YoutubeDL', handler):
def add_opener(ydl, handler):
''' Add a handler for opening URLs, like _download_webpage '''
# https://github.com/python/cpython/blob/main/Lib/urllib/request.py#L426
# https://github.com/python/cpython/blob/main/Lib/urllib/request.py#L605
assert isinstance(self._opener, compat_urllib_request.OpenerDirector)
self._opener.add_handler(handler)
assert isinstance(ydl._opener, compat_urllib_request.OpenerDirector)
ydl._opener.add_handler(handler)
def remove_opener(self: 'YoutubeDL', handler):
def remove_opener(ydl, handler):
'''
Remove handler(s) for opening URLs
@param handler Either handler object itself or handler type.
@ -52,8 +48,8 @@ def remove_opener(self: 'YoutubeDL', handler):
'''
# https://github.com/python/cpython/blob/main/Lib/urllib/request.py#L426
# https://github.com/python/cpython/blob/main/Lib/urllib/request.py#L605
opener = self._opener
assert isinstance(self._opener, compat_urllib_request.OpenerDirector)
opener = ydl._opener
assert isinstance(ydl._opener, compat_urllib_request.OpenerDirector)
if isinstance(handler, (type, tuple)):
find_cp = lambda x: isinstance(x, handler)
else:

View File

@ -1345,6 +1345,11 @@ MSO_INFO = {
'username_field': 'username',
'password_field': 'password',
},
'Suddenlink': {
'name': 'Suddenlink',
'username_field': 'username',
'password_field': 'password',
},
}
@ -1635,6 +1640,52 @@ class AdobePassIE(InfoExtractor):
urlh.geturl(), video_id, 'Sending final bookend',
query=hidden_data)
post_form(mvpd_confirm_page_res, 'Confirming Login')
elif mso_id == 'Suddenlink':
# Suddenlink is similar to SlingTV in using a tab history count and a meta refresh,
# but they also do a dynmaic redirect using javascript that has to be followed as well
first_bookend_page, urlh = post_form(
provider_redirect_page_res, 'Pressing Continue...')
hidden_data = self._hidden_inputs(first_bookend_page)
hidden_data['history_val'] = 1
provider_login_redirect_page = self._download_webpage(
urlh.geturl(), video_id, 'Sending First Bookend',
query=hidden_data)
provider_tryauth_url = self._html_search_regex(
r'url:\s*[\'"]([^\'"]+)', provider_login_redirect_page, 'ajaxurl')
provider_tryauth_page = self._download_webpage(
provider_tryauth_url, video_id, 'Submitting TryAuth',
query=hidden_data)
provider_login_page_res = self._download_webpage_handle(
f'https://authorize.suddenlink.net/saml/module.php/authSynacor/login.php?AuthState={provider_tryauth_page}',
video_id, 'Getting Login Page',
query=hidden_data)
provider_association_redirect, urlh = post_form(
provider_login_page_res, 'Logging in', {
mso_info['username_field']: username,
mso_info['password_field']: password
})
provider_refresh_redirect_url = extract_redirect_url(
provider_association_redirect, url=urlh.geturl())
last_bookend_page, urlh = self._download_webpage_handle(
provider_refresh_redirect_url, video_id,
'Downloading Auth Association Redirect Page')
hidden_data = self._hidden_inputs(last_bookend_page)
hidden_data['history_val'] = 3
mvpd_confirm_page_res = self._download_webpage_handle(
urlh.geturl(), video_id, 'Sending Final Bookend',
query=hidden_data)
post_form(mvpd_confirm_page_res, 'Confirming Login')
else:
# Some providers (e.g. DIRECTV NOW) have another meta refresh

View File

@ -97,8 +97,8 @@ class Ant1NewsGrArticleIE(Ant1NewsGrBaseIE):
embed_urls = list(Ant1NewsGrEmbedIE._extract_urls(webpage))
if not embed_urls:
raise ExtractorError('no videos found for %s' % video_id, expected=True)
return self.url_result_or_playlist_from_matches(
embed_urls, video_id, info['title'], ie=Ant1NewsGrEmbedIE.ie_key(),
return self.playlist_from_matches(
embed_urls, video_id, info.get('title'), ie=Ant1NewsGrEmbedIE.ie_key(),
video_kwargs={'url_transparent': True, 'timestamp': info.get('timestamp')})

View File

@ -1,17 +1,14 @@
# coding: utf-8
from __future__ import unicode_literals
import calendar
import datetime
from .common import InfoExtractor
from ..utils import (
clean_html,
extract_timezone,
int_or_none,
parse_duration,
parse_resolution,
try_get,
unified_timestamp,
url_or_none,
)
@ -95,14 +92,8 @@ class CCMAIE(InfoExtractor):
duration = int_or_none(durada.get('milisegons'), 1000) or parse_duration(durada.get('text'))
tematica = try_get(informacio, lambda x: x['tematica']['text'])
timestamp = None
data_utc = try_get(informacio, lambda x: x['data_emissio']['utc'])
try:
timezone, data_utc = extract_timezone(data_utc)
timestamp = calendar.timegm((datetime.datetime.strptime(
data_utc, '%Y-%d-%mT%H:%M:%S') - timezone).timetuple())
except TypeError:
pass
timestamp = unified_timestamp(data_utc)
subtitles = {}
subtitols = media.get('subtitols') or []

View File

@ -226,6 +226,7 @@ class InfoExtractor(object):
The following fields are optional:
direct: True if a direct video file was given (must only be set by GenericIE)
alt_title: A secondary title of the video.
display_id An alternative identifier for the video, not necessarily
unique, but available before title. Typically, id is
@ -274,7 +275,7 @@ class InfoExtractor(object):
* "url": A URL pointing to the subtitles file
It can optionally also have:
* "name": Name or description of the subtitles
* http_headers: A dictionary of additional HTTP headers
* "http_headers": A dictionary of additional HTTP headers
to add to the request.
"ext" will be calculated from URL if missing
automatic_captions: Like 'subtitles'; contains automatically generated
@ -425,8 +426,8 @@ class InfoExtractor(object):
title, description etc.
Subclasses of this one should re-define the _real_initialize() and
_real_extract() methods and define a _VALID_URL regexp.
Subclasses of this should define a _VALID_URL regexp and, re-define the
_real_extract() and (optionally) _real_initialize() methods.
Probably, they should also be added to the list of extractors.
Subclasses may also override suitable() if necessary, but ensure the function
@ -661,7 +662,7 @@ class InfoExtractor(object):
return False
def set_downloader(self, downloader):
"""Sets the downloader for this IE."""
"""Sets a YoutubeDL instance as the downloader for this IE."""
self._downloader = downloader
def _real_initialize(self):
@ -670,7 +671,7 @@ class InfoExtractor(object):
def _real_extract(self, url):
"""Real extraction process. Redefine in subclasses."""
pass
raise NotImplementedError('This method must be implemented by subclasses')
@classmethod
def ie_key(cls):
@ -1661,31 +1662,31 @@ class InfoExtractor(object):
'format_id': {'type': 'alias', 'field': 'id'},
'preference': {'type': 'alias', 'field': 'ie_pref'},
'language_preference': {'type': 'alias', 'field': 'lang'},
'source_preference': {'type': 'alias', 'field': 'source'},
'protocol': {'type': 'alias', 'field': 'proto'},
'filesize_approx': {'type': 'alias', 'field': 'fs_approx'},
# Deprecated
'dimension': {'type': 'alias', 'field': 'res'},
'resolution': {'type': 'alias', 'field': 'res'},
'extension': {'type': 'alias', 'field': 'ext'},
'bitrate': {'type': 'alias', 'field': 'br'},
'total_bitrate': {'type': 'alias', 'field': 'tbr'},
'video_bitrate': {'type': 'alias', 'field': 'vbr'},
'audio_bitrate': {'type': 'alias', 'field': 'abr'},
'framerate': {'type': 'alias', 'field': 'fps'},
'protocol': {'type': 'alias', 'field': 'proto'},
'source_preference': {'type': 'alias', 'field': 'source'},
'filesize_approx': {'type': 'alias', 'field': 'fs_approx'},
'filesize_estimate': {'type': 'alias', 'field': 'size'},
'samplerate': {'type': 'alias', 'field': 'asr'},
'video_ext': {'type': 'alias', 'field': 'vext'},
'audio_ext': {'type': 'alias', 'field': 'aext'},
'video_codec': {'type': 'alias', 'field': 'vcodec'},
'audio_codec': {'type': 'alias', 'field': 'acodec'},
'video': {'type': 'alias', 'field': 'hasvid'},
'has_video': {'type': 'alias', 'field': 'hasvid'},
'audio': {'type': 'alias', 'field': 'hasaud'},
'has_audio': {'type': 'alias', 'field': 'hasaud'},
'extractor': {'type': 'alias', 'field': 'ie_pref'},
'extractor_preference': {'type': 'alias', 'field': 'ie_pref'},
'dimension': {'type': 'alias', 'field': 'res', 'deprecated': True},
'resolution': {'type': 'alias', 'field': 'res', 'deprecated': True},
'extension': {'type': 'alias', 'field': 'ext', 'deprecated': True},
'bitrate': {'type': 'alias', 'field': 'br', 'deprecated': True},
'total_bitrate': {'type': 'alias', 'field': 'tbr', 'deprecated': True},
'video_bitrate': {'type': 'alias', 'field': 'vbr', 'deprecated': True},
'audio_bitrate': {'type': 'alias', 'field': 'abr', 'deprecated': True},
'framerate': {'type': 'alias', 'field': 'fps', 'deprecated': True},
'filesize_estimate': {'type': 'alias', 'field': 'size', 'deprecated': True},
'samplerate': {'type': 'alias', 'field': 'asr', 'deprecated': True},
'video_ext': {'type': 'alias', 'field': 'vext', 'deprecated': True},
'audio_ext': {'type': 'alias', 'field': 'aext', 'deprecated': True},
'video_codec': {'type': 'alias', 'field': 'vcodec', 'deprecated': True},
'audio_codec': {'type': 'alias', 'field': 'acodec', 'deprecated': True},
'video': {'type': 'alias', 'field': 'hasvid', 'deprecated': True},
'has_video': {'type': 'alias', 'field': 'hasvid', 'deprecated': True},
'audio': {'type': 'alias', 'field': 'hasaud', 'deprecated': True},
'has_audio': {'type': 'alias', 'field': 'hasaud', 'deprecated': True},
'extractor': {'type': 'alias', 'field': 'ie_pref', 'deprecated': True},
'extractor_preference': {'type': 'alias', 'field': 'ie_pref', 'deprecated': True},
}
def __init__(self, ie, field_preference):
@ -1785,7 +1786,7 @@ class InfoExtractor(object):
continue
if self._get_field_setting(field, 'type') == 'alias':
alias, field = field, self._get_field_setting(field, 'field')
if alias not in ('format_id', 'preference', 'language_preference'):
if self._get_field_setting(alias, 'deprecated'):
self.ydl.deprecation_warning(
f'Format sorting alias {alias} is deprecated '
f'and may be removed in a future version. Please use {field} instead')

View File

@ -520,6 +520,7 @@ from .foxnews import (
FoxNewsArticleIE,
)
from .foxsports import FoxSportsIE
from .fptplay import FptplayIE
from .franceculture import FranceCultureIE
from .franceinter import FranceInterIE
from .francetv import (
@ -848,6 +849,7 @@ from .microsoftvirtualacademy import (
from .mildom import (
MildomIE,
MildomVodIE,
MildomClipIE,
MildomUserVodIE,
)
from .minds import (
@ -2010,6 +2012,7 @@ from .ximalaya import (
XimalayaIE,
XimalayaAlbumIE
)
from .xinpianchang import XinpianchangIE
from .xminus import XMinusIE
from .xnxx import XNXXIE
from .xstream import XstreamIE

102
yt_dlp/extractor/fptplay.py Normal file
View File

@ -0,0 +1,102 @@
# coding: utf-8
from __future__ import unicode_literals
import hashlib
import time
import urllib.parse
from .common import InfoExtractor
from ..utils import (
join_nonempty,
)
class FptplayIE(InfoExtractor):
_VALID_URL = r'https?://fptplay\.vn/(?P<type>xem-video)/[^/]+\-(?P<id>\w+)(?:/tap-(?P<episode>[^/]+)?/?(?:[?#]|$)|)'
_GEO_COUNTRIES = ['VN']
IE_NAME = 'fptplay'
IE_DESC = 'fptplay.vn'
_TESTS = [{
'url': 'https://fptplay.vn/xem-video/nhan-duyen-dai-nhan-xin-dung-buoc-621a123016f369ebbde55945',
'md5': 'ca0ee9bc63446c0c3e9a90186f7d6b33',
'info_dict': {
'id': '621a123016f369ebbde55945',
'ext': 'mp4',
'title': 'Nhân Duyên Đại Nhân Xin Dừng Bước - Ms. Cupid In Love',
'description': 'md5:23cf7d1ce0ade8e21e76ae482e6a8c6c',
},
}, {
'url': 'https://fptplay.vn/xem-video/ma-toi-la-dai-gia-61f3aa8a6b3b1d2e73c60eb5/tap-3',
'md5': 'b35be968c909b3e4e1e20ca45dd261b1',
'info_dict': {
'id': '61f3aa8a6b3b1d2e73c60eb5',
'ext': 'mp4',
'title': 'Má Tôi Là Đại Gia - 3',
'description': 'md5:ff8ba62fb6e98ef8875c42edff641d1c',
},
}, {
'url': 'https://fptplay.vn/xem-video/nha-co-chuyen-hi-alls-well-ends-well-1997-6218995f6af792ee370459f0',
'only_matching': True,
}]
def _real_extract(self, url):
type_url, video_id, episode = self._match_valid_url(url).group('type', 'id', 'episode')
webpage = self._download_webpage(url, video_id=video_id, fatal=False)
info = self._download_json(self.get_api_with_st_token(video_id, episode or 0), video_id)
formats, subtitles = self._extract_m3u8_formats_and_subtitles(info['data']['url'], video_id, 'mp4')
self._sort_formats(formats)
return {
'id': video_id,
'title': join_nonempty(
self._html_search_meta(('og:title', 'twitter:title'), webpage), episode, delim=' - '),
'description': self._html_search_meta(['og:description', 'twitter:description'], webpage),
'formats': formats,
'subtitles': subtitles,
}
def get_api_with_st_token(self, video_id, episode):
path = f'/api/v6.2_w/stream/vod/{video_id}/{episode}/auto_vip'
timestamp = int(time.time()) + 10800
t = hashlib.md5(f'WEBv6Dkdsad90dasdjlALDDDS{timestamp}{path}'.encode()).hexdigest().upper()
r = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/'
n = [int(f'0x{t[2 * o: 2 * o + 2]}', 16) for o in range(len(t) // 2)]
def convert(e):
t = ''
n = 0
i = [0, 0, 0]
a = [0, 0, 0, 0]
s = len(e)
c = 0
for z in range(s, 0, -1):
if n <= 3:
i[n] = e[c]
n += 1
c += 1
if 3 == n:
a[0] = (252 & i[0]) >> 2
a[1] = ((3 & i[0]) << 4) + ((240 & i[1]) >> 4)
a[2] = ((15 & i[1]) << 2) + ((192 & i[2]) >> 6)
a[3] = (63 & i[2])
for v in range(4):
t += r[a[v]]
n = 0
if n:
for o in range(n, 3):
i[o] = 0
for o in range(n + 1):
a[0] = (252 & i[0]) >> 2
a[1] = ((3 & i[0]) << 4) + ((240 & i[1]) >> 4)
a[2] = ((15 & i[1]) << 2) + ((192 & i[2]) >> 6)
a[3] = (63 & i[2])
t += r[a[o]]
n += 1
while n < 3:
t += ''
n += 1
return t
st_token = convert(n).replace('+', '-').replace('/', '_').replace('=', '')
return f'https://api.fptplay.net{path}?{urllib.parse.urlencode({"st": st_token, "e": timestamp})}'

View File

@ -252,9 +252,9 @@ class FrontendMastersCourseIE(FrontendMastersPageBaseIE):
entries = []
for lesson in lessons:
lesson_name = lesson.get('slug')
if not lesson_name:
continue
lesson_id = lesson.get('hash') or lesson.get('statsId')
if not lesson_id or not lesson_name:
continue
entries.append(self._extract_lesson(chapters, lesson_id, lesson))
title = course.get('title')

View File

@ -621,7 +621,7 @@ class IqIE(InfoExtractor):
preview_time = traverse_obj(
initial_format_data, ('boss_ts', (None, 'data'), ('previewTime', 'rtime')), expected_type=float_or_none, get_all=False)
if traverse_obj(initial_format_data, ('boss_ts', 'data', 'prv'), expected_type=int_or_none):
self.report_warning('This preview video is limited%s' % format_field(preview_time, template='to %s seconds'))
self.report_warning('This preview video is limited%s' % format_field(preview_time, template=' to %s seconds'))
# TODO: Extract audio-only formats
for bid in set(traverse_obj(initial_format_data, ('program', 'video', ..., 'bid'), expected_type=str_or_none, default=[])):

View File

@ -1,102 +1,42 @@
# coding: utf-8
from __future__ import unicode_literals
import base64
from datetime import datetime
import itertools
import functools
import json
from .common import InfoExtractor
from ..utils import (
update_url_query,
random_uuidv4,
try_get,
determine_ext,
dict_get,
ExtractorError,
float_or_none,
dict_get
)
from ..compat import (
compat_str,
OnDemandPagedList,
random_uuidv4,
traverse_obj,
)
class MildomBaseIE(InfoExtractor):
_GUEST_ID = None
_DISPATCHER_CONFIG = None
def _call_api(self, url, video_id, query=None, note='Downloading JSON metadata', init=False):
query = query or {}
if query:
query['__platform'] = 'web'
url = update_url_query(url, self._common_queries(query, init=init))
content = self._download_json(url, video_id, note=note)
if content['code'] == 0:
return content['body']
else:
self.raise_no_formats(
f'Video not found or premium content. {content["code"]} - {content["message"]}',
def _call_api(self, url, video_id, query=None, note='Downloading JSON metadata', body=None):
if not self._GUEST_ID:
self._GUEST_ID = f'pc-gp-{random_uuidv4()}'
content = self._download_json(
url, video_id, note=note, data=json.dumps(body).encode() if body else None,
headers={'Content-Type': 'application/json'} if body else {},
query={
'__guest_id': self._GUEST_ID,
'__platform': 'web',
**(query or {}),
})
if content['code'] != 0:
raise ExtractorError(
f'Mildom says: {content["message"]} (code {content["code"]})',
expected=True)
def _common_queries(self, query={}, init=False):
dc = self._fetch_dispatcher_config()
r = {
'timestamp': self.iso_timestamp(),
'__guest_id': '' if init else self.guest_id(),
'__location': dc['location'],
'__country': dc['country'],
'__cluster': dc['cluster'],
'__platform': 'web',
'__la': self.lang_code(),
'__pcv': 'v2.9.44',
'sfr': 'pc',
'accessToken': '',
}
r.update(query)
return r
def _fetch_dispatcher_config(self):
if not self._DISPATCHER_CONFIG:
tmp = self._download_json(
'https://disp.mildom.com/serverListV2', 'initialization',
note='Downloading dispatcher_config', data=json.dumps({
'protover': 0,
'data': base64.b64encode(json.dumps({
'fr': 'web',
'sfr': 'pc',
'devi': 'Windows',
'la': 'ja',
'gid': None,
'loc': '',
'clu': '',
'wh': '1919*810',
'rtm': self.iso_timestamp(),
'ua': self.get_param('http_headers')['User-Agent'],
}).encode('utf8')).decode('utf8').replace('\n', ''),
}).encode('utf8'))
self._DISPATCHER_CONFIG = self._parse_json(base64.b64decode(tmp['data']), 'initialization')
return self._DISPATCHER_CONFIG
@staticmethod
def iso_timestamp():
'new Date().toISOString()'
return datetime.utcnow().isoformat()[0:-3] + 'Z'
def guest_id(self):
'getGuestId'
if self._GUEST_ID:
return self._GUEST_ID
self._GUEST_ID = try_get(
self, (
lambda x: x._call_api(
'https://cloudac.mildom.com/nonolive/gappserv/guest/h5init', 'initialization',
note='Downloading guest token', init=True)['guest_id'] or None,
lambda x: x._get_cookies('https://www.mildom.com').get('gid').value,
lambda x: x._get_cookies('https://m.mildom.com').get('gid').value,
), compat_str) or ''
return self._GUEST_ID
def lang_code(self):
'getCurrentLangCode'
return 'ja'
return content['body']
class MildomIE(MildomBaseIE):
@ -106,31 +46,13 @@ class MildomIE(MildomBaseIE):
def _real_extract(self, url):
video_id = self._match_id(url)
url = 'https://www.mildom.com/%s' % video_id
webpage = self._download_webpage(url, video_id)
webpage = self._download_webpage(f'https://www.mildom.com/{video_id}', video_id)
enterstudio = self._call_api(
'https://cloudac.mildom.com/nonolive/gappserv/live/enterstudio', video_id,
note='Downloading live metadata', query={'user_id': video_id})
result_video_id = enterstudio.get('log_id', video_id)
title = try_get(
enterstudio, (
lambda x: self._html_search_meta('twitter:description', webpage),
lambda x: x['anchor_intro'],
), compat_str)
description = try_get(
enterstudio, (
lambda x: x['intro'],
lambda x: x['live_intro'],
), compat_str)
uploader = try_get(
enterstudio, (
lambda x: self._html_search_meta('twitter:title', webpage),
lambda x: x['loginname'],
), compat_str)
servers = self._call_api(
'https://cloudac.mildom.com/nonolive/gappserv/live/liveserver', result_video_id,
note='Downloading live server list', query={
@ -138,17 +60,20 @@ class MildomIE(MildomBaseIE):
'live_server_type': 'hls',
})
stream_query = self._common_queries({
'streamReqId': random_uuidv4(),
'is_lhls': '0',
})
m3u8_url = update_url_query(servers['stream_server'] + '/%s_master.m3u8' % video_id, stream_query)
formats = self._extract_m3u8_formats(m3u8_url, result_video_id, 'mp4', headers={
'Referer': 'https://www.mildom.com/',
'Origin': 'https://www.mildom.com',
}, note='Downloading m3u8 information')
playback_token = self._call_api(
'https://cloudac.mildom.com/nonolive/gappserv/live/token', result_video_id,
note='Obtaining live playback token', body={'host_id': video_id, 'type': 'hls'})
playback_token = traverse_obj(playback_token, ('data', ..., 'token'), get_all=False)
if not playback_token:
raise ExtractorError('Failed to obtain live playback token')
formats = self._extract_m3u8_formats(
f'{servers["stream_server"]}/{video_id}_master.m3u8?{playback_token}',
result_video_id, 'mp4', headers={
'Referer': 'https://www.mildom.com/',
'Origin': 'https://www.mildom.com',
})
del stream_query['streamReqId'], stream_query['timestamp']
for fmt in formats:
fmt.setdefault('http_headers', {})['Referer'] = 'https://www.mildom.com/'
@ -156,10 +81,10 @@ class MildomIE(MildomBaseIE):
return {
'id': result_video_id,
'title': title,
'description': description,
'title': self._html_search_meta('twitter:description', webpage, default=None) or traverse_obj(enterstudio, 'anchor_intro'),
'description': traverse_obj(enterstudio, 'intro', 'live_intro', expected_type=str),
'timestamp': float_or_none(enterstudio.get('live_start_ms'), scale=1000),
'uploader': uploader,
'uploader': self._html_search_meta('twitter:title', webpage, default=None) or traverse_obj(enterstudio, 'loginname'),
'uploader_id': video_id,
'formats': formats,
'is_live': True,
@ -168,7 +93,7 @@ class MildomIE(MildomBaseIE):
class MildomVodIE(MildomBaseIE):
IE_NAME = 'mildom:vod'
IE_DESC = 'Download a VOD in Mildom'
IE_DESC = 'VOD in Mildom'
_VALID_URL = r'https?://(?:(?:www|m)\.)mildom\.com/playback/(?P<user_id>\d+)/(?P<id>(?P=user_id)-[a-zA-Z0-9]+-?[0-9]*)'
_TESTS = [{
'url': 'https://www.mildom.com/playback/10882672/10882672-1597662269',
@ -215,11 +140,8 @@ class MildomVodIE(MildomBaseIE):
}]
def _real_extract(self, url):
m = self._match_valid_url(url)
user_id, video_id = m.group('user_id'), m.group('id')
url = 'https://www.mildom.com/playback/%s/%s' % (user_id, video_id)
webpage = self._download_webpage(url, video_id)
user_id, video_id = self._match_valid_url(url).group('user_id', 'id')
webpage = self._download_webpage(f'https://www.mildom.com/playback/{user_id}/{video_id}', video_id)
autoplay = self._call_api(
'https://cloudac.mildom.com/nonolive/videocontent/playback/getPlaybackDetail', video_id,
@ -227,20 +149,6 @@ class MildomVodIE(MildomBaseIE):
'v_id': video_id,
})['playback']
title = try_get(
autoplay, (
lambda x: self._html_search_meta('og:description', webpage),
lambda x: x['title'],
), compat_str)
description = try_get(
autoplay, (
lambda x: x['video_intro'],
), compat_str)
uploader = try_get(
autoplay, (
lambda x: x['author_info']['login_name'],
), compat_str)
formats = [{
'url': autoplay['audio_url'],
'format_id': 'audio',
@ -265,17 +173,81 @@ class MildomVodIE(MildomBaseIE):
return {
'id': video_id,
'title': title,
'description': description,
'timestamp': float_or_none(autoplay['publish_time'], scale=1000),
'duration': float_or_none(autoplay['video_length'], scale=1000),
'title': self._html_search_meta(('og:description', 'description'), webpage, default=None) or autoplay.get('title'),
'description': traverse_obj(autoplay, 'video_intro'),
'timestamp': float_or_none(autoplay.get('publish_time'), scale=1000),
'duration': float_or_none(autoplay.get('video_length'), scale=1000),
'thumbnail': dict_get(autoplay, ('upload_pic', 'video_pic')),
'uploader': uploader,
'uploader': traverse_obj(autoplay, ('author_info', 'login_name')),
'uploader_id': user_id,
'formats': formats,
}
class MildomClipIE(MildomBaseIE):
IE_NAME = 'mildom:clip'
IE_DESC = 'Clip in Mildom'
_VALID_URL = r'https?://(?:(?:www|m)\.)mildom\.com/clip/(?P<id>(?P<user_id>\d+)-[a-zA-Z0-9]+)'
_TESTS = [{
'url': 'https://www.mildom.com/clip/10042245-63921673e7b147ebb0806d42b5ba5ce9',
'info_dict': {
'id': '10042245-63921673e7b147ebb0806d42b5ba5ce9',
'title': '全然違ったよ',
'timestamp': 1619181890,
'duration': 59,
'thumbnail': r're:https?://.+',
'uploader': 'ざきんぽ',
'uploader_id': '10042245',
},
}, {
'url': 'https://www.mildom.com/clip/10111524-ebf4036e5aa8411c99fb3a1ae0902864',
'info_dict': {
'id': '10111524-ebf4036e5aa8411c99fb3a1ae0902864',
'title': 'かっこいい',
'timestamp': 1621094003,
'duration': 59,
'thumbnail': r're:https?://.+',
'uploader': '(ルーキー',
'uploader_id': '10111524',
},
}, {
'url': 'https://www.mildom.com/clip/10660174-2c539e6e277c4aaeb4b1fbe8d22cb902',
'info_dict': {
'id': '10660174-2c539e6e277c4aaeb4b1fbe8d22cb902',
'title': '',
'timestamp': 1614769431,
'duration': 31,
'thumbnail': r're:https?://.+',
'uploader': 'ドルゴルスレンギーン=ダグワドルジ',
'uploader_id': '10660174',
},
}]
def _real_extract(self, url):
user_id, video_id = self._match_valid_url(url).group('user_id', 'id')
webpage = self._download_webpage(f'https://www.mildom.com/clip/{video_id}', video_id)
clip_detail = self._call_api(
'https://cloudac-cf-jp.mildom.com/nonolive/videocontent/clip/detail', video_id,
note='Downloading playback metadata', query={
'clip_id': video_id,
})
return {
'id': video_id,
'title': self._html_search_meta(
('og:description', 'description'), webpage, default=None) or clip_detail.get('title'),
'timestamp': float_or_none(clip_detail.get('create_time')),
'duration': float_or_none(clip_detail.get('length')),
'thumbnail': clip_detail.get('cover'),
'uploader': traverse_obj(clip_detail, ('user_info', 'loginname')),
'uploader_id': user_id,
'url': clip_detail['url'],
'ext': determine_ext(clip_detail.get('url'), 'mp4'),
}
class MildomUserVodIE(MildomBaseIE):
IE_NAME = 'mildom:user:vod'
IE_DESC = 'Download all VODs from specific user in Mildom'
@ -286,29 +258,32 @@ class MildomUserVodIE(MildomBaseIE):
'id': '10093333',
'title': 'Uploads from ねこばたけ',
},
'playlist_mincount': 351,
'playlist_mincount': 732,
}, {
'url': 'https://www.mildom.com/profile/10882672',
'info_dict': {
'id': '10882672',
'title': 'Uploads from kson組長(けいそん)',
},
'playlist_mincount': 191,
'playlist_mincount': 201,
}]
def _entries(self, user_id):
for page in itertools.count(1):
reply = self._call_api(
'https://cloudac.mildom.com/nonolive/videocontent/profile/playbackList',
user_id, note='Downloading page %d' % page, query={
'user_id': user_id,
'page': page,
'limit': '30',
})
if not reply:
break
for x in reply:
yield self.url_result('https://www.mildom.com/playback/%s/%s' % (user_id, x['v_id']))
def _fetch_page(self, user_id, page):
page += 1
reply = self._call_api(
'https://cloudac.mildom.com/nonolive/videocontent/profile/playbackList',
user_id, note=f'Downloading page {page}', query={
'user_id': user_id,
'page': page,
'limit': '30',
})
if not reply:
return
for x in reply:
v_id = x.get('v_id')
if not v_id:
continue
yield self.url_result(f'https://www.mildom.com/playback/{user_id}/{v_id}')
def _real_extract(self, url):
user_id = self._match_id(url)
@ -319,4 +294,5 @@ class MildomUserVodIE(MildomBaseIE):
query={'user_id': user_id}, note='Downloading user profile')['user_info']
return self.playlist_result(
self._entries(user_id), user_id, 'Uploads from %s' % profile['loginname'])
OnDemandPagedList(functools.partial(self._fetch_page, user_id), 30),
user_id, f'Uploads from {profile["loginname"]}')

View File

@ -87,6 +87,7 @@ class PeerTubeIE(InfoExtractor):
maindreieck-tv\.de|
mani\.tube|
manicphase\.me|
media\.fsfe\.org|
media\.gzevd\.de|
media\.inno3\.cricket|
media\.kaitaia\.life|

View File

@ -33,7 +33,7 @@ class PeriscopeBaseIE(InfoExtractor):
return {
'id': broadcast.get('id') or video_id,
'title': self._live_title(title) if is_live else title,
'title': title,
'timestamp': parse_iso8601(broadcast.get('created_at')),
'uploader': uploader,
'uploader_id': broadcast.get('user_id') or broadcast.get('username'),

View File

@ -59,8 +59,16 @@ class SoundcloudEmbedIE(InfoExtractor):
class SoundcloudBaseIE(InfoExtractor):
_NETRC_MACHINE = 'soundcloud'
_API_V2_BASE = 'https://api-v2.soundcloud.com/'
_BASE_URL = 'https://soundcloud.com/'
_USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36'
_API_AUTH_QUERY_TEMPLATE = '?client_id=%s'
_API_AUTH_URL_PW = 'https://api-auth.soundcloud.com/web-auth/sign-in/password%s'
_API_VERIFY_AUTH_TOKEN = 'https://api-auth.soundcloud.com/connect/session%s'
_access_token = None
_HEADERS = {}
def _store_client_id(self, client_id):
self._downloader.cache.store('soundcloud', 'client_id', client_id)
@ -103,14 +111,6 @@ class SoundcloudBaseIE(InfoExtractor):
self._CLIENT_ID = self._downloader.cache.load('soundcloud', 'client_id') or 'a3e059563d7fd3372b49b37f00a00bcf'
self._login()
_USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36'
_API_AUTH_QUERY_TEMPLATE = '?client_id=%s'
_API_AUTH_URL_PW = 'https://api-auth.soundcloud.com/web-auth/sign-in/password%s'
_API_VERIFY_AUTH_TOKEN = 'https://api-auth.soundcloud.com/connect/session%s'
_access_token = None
_HEADERS = {}
_NETRC_MACHINE = 'soundcloud'
def _login(self):
username, password = self._get_login_info()
if username is None:

View File

@ -67,6 +67,7 @@ class SovietsClosetIE(SovietsClosetBaseIE):
'series': 'The Witcher',
'season': 'Misc',
'episode_number': 13,
'episode': 'Episode 13',
},
},
{
@ -92,6 +93,7 @@ class SovietsClosetIE(SovietsClosetBaseIE):
'series': 'Arma 3',
'season': 'Zeus Games',
'episode_number': 3,
'episode': 'Episode 3',
},
},
]

View File

@ -0,0 +1,95 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
int_or_none,
try_get,
update_url_query,
url_or_none,
)
class XinpianchangIE(InfoExtractor):
_VALID_URL = r'https?://www\.xinpianchang\.com/(?P<id>[^/]+?)(?:\D|$)'
IE_NAME = 'xinpianchang'
IE_DESC = 'xinpianchang.com'
_TESTS = [{
'url': 'https://www.xinpianchang.com/a11766551',
'info_dict': {
'id': 'a11766551',
'ext': 'mp4',
'title': '北京2022冬奥会闭幕式再见短片-冰墩墩下班了',
'description': 'md5:4a730c10639a82190fabe921c0fa4b87',
'duration': 151,
'thumbnail': r're:^https?://oss-xpc0\.xpccdn\.com.+/assets/',
'uploader': '正时文创',
'uploader_id': 10357277,
'categories': ['宣传片', '国家城市', '广告', '其他'],
'keywords': ['北京冬奥会', '冰墩墩', '再见', '告别', '冰墩墩哭了', '感动', '闭幕式', '熄火']
},
}, {
'url': 'https://www.xinpianchang.com/a11762904',
'info_dict': {
'id': 'a11762904',
'ext': 'mp4',
'title': '冬奥会决胜时刻《法国派出三只鸡?》',
'description': 'md5:55cb139ef8f48f0c877932d1f196df8b',
'duration': 136,
'thumbnail': r're:^https?://oss-xpc0\.xpccdn\.com.+/assets/',
'uploader': '精品动画',
'uploader_id': 10858927,
'categories': ['动画', '三维CG'],
'keywords': ['France Télévisions', '法国3台', '蠢萌', '冬奥会']
},
}, {
'url': 'https://www.xinpianchang.com/a11779743?from=IndexPick&part=%E7%BC%96%E8%BE%91%E7%B2%BE%E9%80%89&index=2',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id=video_id)
domain = self.find_value_with_regex(var='requireNewDomain', webpage=webpage)
vid = self.find_value_with_regex(var='vid', webpage=webpage)
app_key = self.find_value_with_regex(var='modeServerAppKey', webpage=webpage)
api = update_url_query(f'{domain}/mod/api/v2/media/{vid}', {'appKey': app_key})
data = self._download_json(api, video_id=video_id)['data']
formats, subtitles = [], {}
for k, v in data.get('resource').items():
if k in ('dash', 'hls'):
v_url = v.get('url')
if not v_url:
continue
if k == 'dash':
fmts, subs = self._extract_mpd_formats_and_subtitles(v_url, video_id=video_id)
elif k == 'hls':
fmts, subs = self._extract_m3u8_formats_and_subtitles(v_url, video_id=video_id)
formats.extend(fmts)
subtitles = self._merge_subtitles(subtitles, subs)
elif k == 'progressive':
formats.extend([{
'url': url_or_none(prog.get('url')),
'width': int_or_none(prog.get('width')),
'height': int_or_none(prog.get('height')),
'ext': 'mp4',
} for prog in v if prog.get('url') or []])
self._sort_formats(formats)
return {
'id': video_id,
'title': data.get('title'),
'description': data.get('description'),
'duration': int_or_none(data.get('duration')),
'categories': data.get('categories'),
'keywords': data.get('keywords'),
'thumbnail': data.get('cover'),
'uploader': try_get(data, lambda x: x['owner']['username']),
'uploader_id': try_get(data, lambda x: x['owner']['id']),
'formats': formats,
'subtitles': subtitles,
}
def find_value_with_regex(self, var, webpage):
return self._search_regex(rf'var\s{var}\s=\s\"(?P<vid>[^\"]+)\"', webpage, name=var)

View File

@ -3094,6 +3094,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
# Some formats may have much smaller duration than others (possibly damaged during encoding)
# Eg: 2-nOtRESiUc Ref: https://github.com/yt-dlp/yt-dlp/issues/2823
is_damaged = try_get(fmt, lambda x: float(x['approxDurationMs']) < approx_duration - 10000)
if is_damaged:
self.report_warning(f'{video_id}: Some formats are possibly damaged. They will be deprioritized', only_once=True)
dct = {
'asr': int_or_none(fmt.get('audioSampleRate')),
'filesize': int_or_none(fmt.get('contentLength')),

View File

@ -149,7 +149,7 @@ class ZingMp3IE(ZingMp3BaseIE):
},
}, {
'url': 'https://zingmp3.vn/video-clip/Suong-Hoa-Dua-Loi-K-ICM-RYO/ZO8ZF7C7.html',
'md5': 'e9c972b693aa88301ef981c8151c4343',
'md5': 'c7f23d971ac1a4f675456ed13c9b9612',
'info_dict': {
'id': 'ZO8ZF7C7',
'title': 'Sương Hoa Đưa Lối',
@ -158,6 +158,8 @@ class ZingMp3IE(ZingMp3BaseIE):
'duration': 207,
'track': 'Sương Hoa Đưa Lối',
'artist': 'K-ICM, RYO',
'album': 'Sương Hoa Đưa Lối (Single)',
'album_artist': 'K-ICM, RYO',
},
}, {
'url': 'https://zingmp3.vn/embed/song/ZWZEI76B?start=false',

View File

@ -47,6 +47,7 @@ from .compat import (
compat_HTMLParser,
compat_HTTPError,
compat_basestring,
compat_brotli,
compat_chr,
compat_cookiejar,
compat_ctypes_WINFUNCTYPE,
@ -143,10 +144,16 @@ def random_user_agent():
return _USER_AGENT_TPL % random.choice(_CHROME_VERSIONS)
SUPPORTED_ENCODINGS = [
'gzip', 'deflate'
]
if compat_brotli:
SUPPORTED_ENCODINGS.append('br')
std_headers = {
'User-Agent': random_user_agent(),
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Encoding': 'gzip, deflate',
'Accept-Encoding': ', '.join(SUPPORTED_ENCODINGS),
'Accept-Language': 'en-us,en;q=0.5',
'Sec-Fetch-Mode': 'navigate',
}
@ -1023,7 +1030,7 @@ def make_HTTPS_handler(params, **kwargs):
def bug_reports_message(before=';'):
msg = ('please report this issue on https://github.com/yt-dlp/yt-dlp , '
'filling out the "Broken site" issue template properly. '
'Confirm you are on the latest version using -U')
'Confirm you are on the latest version using yt-dlp -U')
before = before.rstrip()
if not before or before.endswith(('.', '!', '?')):
@ -1357,6 +1364,12 @@ class YoutubeDLHandler(compat_urllib_request.HTTPHandler):
except zlib.error:
return zlib.decompress(data)
@staticmethod
def brotli(data):
if not data:
return data
return compat_brotli.decompress(data)
def http_request(self, req):
# According to RFC 3986, URLs can not contain non-ASCII characters, however this is not
# always respected by websites, some tend to give out URLs with non percent-encoded
@ -1417,6 +1430,12 @@ class YoutubeDLHandler(compat_urllib_request.HTTPHandler):
resp = compat_urllib_request.addinfourl(gz, old_resp.headers, old_resp.url, old_resp.code)
resp.msg = old_resp.msg
del resp.headers['Content-encoding']
# brotli
if resp.headers.get('Content-encoding', '') == 'br':
resp = compat_urllib_request.addinfourl(
io.BytesIO(self.brotli(resp.read())), old_resp.headers, old_resp.url, old_resp.code)
resp.msg = old_resp.msg
del resp.headers['Content-encoding']
# Percent-encode redirect URL of Location HTTP header to satisfy RFC 3986 (see
# https://github.com/ytdl-org/youtube-dl/issues/6457).
if 300 <= resp.code < 400:
@ -5462,5 +5481,5 @@ has_websockets = bool(compat_websockets)
def merge_headers(*dicts):
"""Merge dicts of network headers case insensitively, prioritizing the latter ones"""
"""Merge dicts of http headers case insensitively, prioritizing the latter ones"""
return {k.capitalize(): v for k, v in itertools.chain.from_iterable(map(dict.items, dicts))}