Compare commits

..

9 Commits

Author SHA1 Message Date
pukkandan
08d30158ec
[cleanup, docs] Misc cleanup
Closes #2828, closes #2734, closes #2802, closes #2937
2022-03-08 22:38:06 +05:30
Ha Tien Loi
c89bec262c
[xinpianchang] Add extractor (#2963)
Authored by: hatienl0i261299
2022-03-08 08:55:40 -08:00
Ha Tien Loi
151f8f1c02
[fptplay] Add extractor (#2949)
Closes #2857
Authored by: hatienl0i261299
2022-03-08 08:52:51 -08:00
Max Mehl
a35155be17
[peertube] Add media.fsfe.org (#2986)
Authored by: mxmehl
2022-03-08 08:48:35 -08:00
nyuszika7h
e66662b1e0
[ccma] Fix timestamp parsing (#2989)
Authored by: nyuszika7h
2022-03-08 08:45:23 -08:00
coletdev
4390d5ec12
Add brotli content-encoding support (#2433)
Authored by: coletdjnz
2022-03-08 08:44:05 -08:00
CplPwnies
9e0e6adb2d
[adobepass] Add Suddenlink MSO (#2977)
Closes #2704
Authored by: CplPwnies
2022-03-08 08:18:52 -08:00
Lesmiscore
b637c4e22e
[mildom] Fix linter 2022-03-08 23:56:30 +09:00
Lesmiscore (Naoya Ozaki)
fb6e3f4389
[mildom] Rework extractors (#2940)
Authored by: Lesmiscore
2022-03-08 23:49:10 +09:00
30 changed files with 537 additions and 259 deletions

2
.gitignore vendored
View File

@ -24,6 +24,7 @@ cookies
*.3gp *.3gp
*.ape *.ape
*.ass
*.avi *.avi
*.desktop *.desktop
*.flac *.flac
@ -106,6 +107,7 @@ yt-dlp.zip
*.iml *.iml
.vscode .vscode
*.sublime-* *.sublime-*
*.code-workspace
# Lazy extractors # Lazy extractors
*/extractor/lazy_extractors.py */extractor/lazy_extractors.py

View File

@ -11,6 +11,7 @@
- [Is anyone going to need the feature?](#is-anyone-going-to-need-the-feature) - [Is anyone going to need the feature?](#is-anyone-going-to-need-the-feature)
- [Is your question about yt-dlp?](#is-your-question-about-yt-dlp) - [Is your question about yt-dlp?](#is-your-question-about-yt-dlp)
- [Are you willing to share account details if needed?](#are-you-willing-to-share-account-details-if-needed) - [Are you willing to share account details if needed?](#are-you-willing-to-share-account-details-if-needed)
- [Is the website primarily used for piracy](#is-the-website-primarily-used-for-piracy)
- [DEVELOPER INSTRUCTIONS](#developer-instructions) - [DEVELOPER INSTRUCTIONS](#developer-instructions)
- [Adding new feature or making overarching changes](#adding-new-feature-or-making-overarching-changes) - [Adding new feature or making overarching changes](#adding-new-feature-or-making-overarching-changes)
- [Adding support for a new site](#adding-support-for-a-new-site) - [Adding support for a new site](#adding-support-for-a-new-site)
@ -24,6 +25,7 @@
- [Collapse fallbacks](#collapse-fallbacks) - [Collapse fallbacks](#collapse-fallbacks)
- [Trailing parentheses](#trailing-parentheses) - [Trailing parentheses](#trailing-parentheses)
- [Use convenience conversion and parsing functions](#use-convenience-conversion-and-parsing-functions) - [Use convenience conversion and parsing functions](#use-convenience-conversion-and-parsing-functions)
- [My pull request is labeled pending-fixes](#my-pull-request-is-labeled-pending-fixes)
- [EMBEDDING YT-DLP](README.md#embedding-yt-dlp) - [EMBEDDING YT-DLP](README.md#embedding-yt-dlp)
@ -123,6 +125,10 @@ While these steps won't necessarily ensure that no misuse of the account takes p
- Change the password before sharing the account to something random (use [this](https://passwordsgenerator.net/) if you don't have a random password generator). - Change the password before sharing the account to something random (use [this](https://passwordsgenerator.net/) if you don't have a random password generator).
- Change the password after receiving the account back. - Change the password after receiving the account back.
### Is the website primarily used for piracy?
We follow [youtube-dl's policy](https://github.com/ytdl-org/youtube-dl#can-you-add-support-for-this-anime-video-site-or-site-which-shows-current-movies-for-free) to not support services that is primarily used for infringing copyright. Additionally, it has been decided to not to support porn sites that specialize in deep fake. We also cannot support any service that serves only [DRM protected content](https://en.wikipedia.org/wiki/Digital_rights_management).
@ -210,7 +216,7 @@ After you have ensured this site is distributing its content legally, you can fo
} }
``` ```
1. Add an import in [`yt_dlp/extractor/extractors.py`](yt_dlp/extractor/extractors.py). 1. Add an import in [`yt_dlp/extractor/extractors.py`](yt_dlp/extractor/extractors.py).
1. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, the tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc. Note that tests with `only_matching` key in test's dict are not counted in. You can also run all the tests in one go with `TestDownload.test_YourExtractor_all` 1. Run `python test/test_download.py TestDownload.test_YourExtractor` (note that `YourExtractor` doesn't end with `IE`). This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, the tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc. Note that tests with `only_matching` key in test's dict are not counted in. You can also run all the tests in one go with `TestDownload.test_YourExtractor_all`
1. Make sure you have atleast one test for your extractor. Even if all videos covered by the extractor are expected to be inaccessible for automated testing, tests should still be added with a `skip` parameter indicating why the particular test is disabled from running. 1. Make sure you have atleast one test for your extractor. Even if all videos covered by the extractor are expected to be inaccessible for automated testing, tests should still be added with a `skip` parameter indicating why the particular test is disabled from running.
1. Have a look at [`yt_dlp/extractor/common.py`](yt_dlp/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](yt_dlp/extractor/common.py#L91-L426). Add tests and code for as many as you want. 1. Have a look at [`yt_dlp/extractor/common.py`](yt_dlp/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](yt_dlp/extractor/common.py#L91-L426). Add tests and code for as many as you want.
1. Make sure your code follows [yt-dlp coding conventions](#yt-dlp-coding-conventions) and check the code with [flake8](https://flake8.pycqa.org/en/latest/index.html#quickstart): 1. Make sure your code follows [yt-dlp coding conventions](#yt-dlp-coding-conventions) and check the code with [flake8](https://flake8.pycqa.org/en/latest/index.html#quickstart):
@ -658,6 +664,10 @@ duration = float_or_none(video.get('durationMs'), scale=1000)
view_count = int_or_none(video.get('views')) view_count = int_or_none(video.get('views'))
``` ```
# My pull request is labeled pending-fixes
The `pending-fixes` label is added when there are changes requested to a PR. When the necessary changes are made, the label should be removed. However, despite our best efforts, it may sometimes happen that the maintainer did not see the changes or forgot to remove the label. If your PR is still marked as `pending-fixes` a few days after all requested changes have been made, feel free to ping the maintainer who labeled your issue and ask them to re-review and remove the label.

View File

@ -146,7 +146,7 @@ chio0hai
cntrl-s cntrl-s
Deer-Spangle Deer-Spangle
DEvmIb DEvmIb
Grabien Grabien/MaximVol
j54vc1bk j54vc1bk
mpeter50 mpeter50
mrpapersonic mrpapersonic
@ -160,7 +160,7 @@ PilzAdam
zmousm zmousm
iw0nderhow iw0nderhow
unit193 unit193
TwoThousandHedgehogs TwoThousandHedgehogs/KathrynElrod
Jertzukka Jertzukka
cypheron cypheron
Hyeeji Hyeeji

View File

@ -16,7 +16,7 @@ pypi-files: AUTHORS Changelog.md LICENSE README.md README.txt supportedsites com
clean-test: clean-test:
rm -rf test/testdata/sigs/player-*.js tmp/ *.annotations.xml *.aria2 *.description *.dump *.frag \ rm -rf test/testdata/sigs/player-*.js tmp/ *.annotations.xml *.aria2 *.description *.dump *.frag \
*.frag.aria2 *.frag.urls *.info.json *.live_chat.json *.meta *.part* *.tmp *.temp *.unknown_video *.ytdl \ *.frag.aria2 *.frag.urls *.info.json *.live_chat.json *.meta *.part* *.tmp *.temp *.unknown_video *.ytdl \
*.3gp *.ape *.avi *.desktop *.flac *.flv *.jpeg *.jpg *.m4a *.m4v *.mhtml *.mkv *.mov *.mp3 \ *.3gp *.ape *.ass *.avi *.desktop *.flac *.flv *.jpeg *.jpg *.m4a *.m4v *.mhtml *.mkv *.mov *.mp3 \
*.mp4 *.ogg *.opus *.png *.sbv *.srt *.swf *.swp *.ttml *.url *.vtt *.wav *.webloc *.webm *.webp *.mp4 *.ogg *.opus *.png *.sbv *.srt *.swf *.swp *.ttml *.url *.vtt *.wav *.webloc *.webm *.webp
clean-dist: clean-dist:
rm -rf yt-dlp.1.temp.md yt-dlp.1 README.txt MANIFEST build/ dist/ .coverage cover/ yt-dlp.tar.gz completions/ \ rm -rf yt-dlp.1.temp.md yt-dlp.1 README.txt MANIFEST build/ dist/ .coverage cover/ yt-dlp.tar.gz completions/ \

View File

@ -112,7 +112,7 @@ yt-dlp is a [youtube-dl](https://github.com/ytdl-org/youtube-dl) fork based on t
* **Other new options**: Many new options have been added such as `--concat-playlist`, `--print`, `--wait-for-video`, `--sleep-requests`, `--convert-thumbnails`, `--write-link`, `--force-download-archive`, `--force-overwrites`, `--break-on-reject` etc * **Other new options**: Many new options have been added such as `--concat-playlist`, `--print`, `--wait-for-video`, `--sleep-requests`, `--convert-thumbnails`, `--write-link`, `--force-download-archive`, `--force-overwrites`, `--break-on-reject` etc
* **Improvements**: Regex and other operators in `--match-filter`, multiple `--postprocessor-args` and `--downloader-args`, faster archive checking, more [format selection options](#format-selection), merge multi-video/audio, multiple `--config-locations`, `--exec` at different stages, etc * **Improvements**: Regex and other operators in `--format`/`--match-filter`, multiple `--postprocessor-args` and `--downloader-args`, faster archive checking, more [format selection options](#format-selection), merge multi-video/audio, multiple `--config-locations`, `--exec` at different stages, etc
* **Plugins**: Extractors and PostProcessors can be loaded from an external file. See [plugins](#plugins) for details * **Plugins**: Extractors and PostProcessors can be loaded from an external file. See [plugins](#plugins) for details
@ -130,7 +130,7 @@ Some of yt-dlp's default options are different from that of youtube-dl and youtu
* The default [format sorting](#sorting-formats) is different from youtube-dl and prefers higher resolution and better codecs rather than higher bitrates. You can use the `--format-sort` option to change this to any order you prefer, or use `--compat-options format-sort` to use youtube-dl's sorting order * The default [format sorting](#sorting-formats) is different from youtube-dl and prefers higher resolution and better codecs rather than higher bitrates. You can use the `--format-sort` option to change this to any order you prefer, or use `--compat-options format-sort` to use youtube-dl's sorting order
* The default format selector is `bv*+ba/b`. This means that if a combined video + audio format that is better than the best video-only format is found, the former will be preferred. Use `-f bv+ba/b` or `--compat-options format-spec` to revert this * The default format selector is `bv*+ba/b`. This means that if a combined video + audio format that is better than the best video-only format is found, the former will be preferred. Use `-f bv+ba/b` or `--compat-options format-spec` to revert this
* Unlike youtube-dlc, yt-dlp does not allow merging multiple audio/video streams into one file by default (since this conflicts with the use of `-f bv*+ba`). If needed, this feature must be enabled using `--audio-multistreams` and `--video-multistreams`. You can also use `--compat-options multistreams` to enable both * Unlike youtube-dlc, yt-dlp does not allow merging multiple audio/video streams into one file by default (since this conflicts with the use of `-f bv*+ba`). If needed, this feature must be enabled using `--audio-multistreams` and `--video-multistreams`. You can also use `--compat-options multistreams` to enable both
* `--ignore-errors` is enabled by default. Use `--abort-on-error` or `--compat-options abort-on-error` to abort on errors instead * `--no-abort-on-error` is enabled by default. Use `--abort-on-error` or `--compat-options abort-on-error` to abort on errors instead
* When writing metadata files such as thumbnails, description or infojson, the same information (if available) is also written for playlists. Use `--no-write-playlist-metafiles` or `--compat-options no-playlist-metafiles` to not write these files * When writing metadata files such as thumbnails, description or infojson, the same information (if available) is also written for playlists. Use `--no-write-playlist-metafiles` or `--compat-options no-playlist-metafiles` to not write these files
* `--add-metadata` attaches the `infojson` to `mkv` files in addition to writing the metadata when used with `--write-info-json`. Use `--no-embed-info-json` or `--compat-options no-attach-info-json` to revert this * `--add-metadata` attaches the `infojson` to `mkv` files in addition to writing the metadata when used with `--write-info-json`. Use `--no-embed-info-json` or `--compat-options no-attach-info-json` to revert this
* Some metadata are embedded into different fields when using `--add-metadata` as compared to youtube-dl. Most notably, `comment` field contains the `webpage_url` and `synopsis` contains the `description`. You can [use `--parse-metadata`](#modifying-metadata) to modify this to your liking or use `--compat-options embed-metadata` to revert this * Some metadata are embedded into different fields when using `--add-metadata` as compared to youtube-dl. Most notably, `comment` field contains the `webpage_url` and `synopsis` contains the `description`. You can [use `--parse-metadata`](#modifying-metadata) to modify this to your liking or use `--compat-options embed-metadata` to revert this
@ -267,7 +267,8 @@ While all the other dependencies are optional, `ffmpeg` and `ffprobe` are highly
* [**pycryptodomex**](https://github.com/Legrandin/pycryptodome) - For decrypting AES-128 HLS streams and various other data. Licensed under [BSD2](https://github.com/Legrandin/pycryptodome/blob/master/LICENSE.rst) * [**pycryptodomex**](https://github.com/Legrandin/pycryptodome) - For decrypting AES-128 HLS streams and various other data. Licensed under [BSD2](https://github.com/Legrandin/pycryptodome/blob/master/LICENSE.rst)
* [**websockets**](https://github.com/aaugustin/websockets) - For downloading over websocket. Licensed under [BSD3](https://github.com/aaugustin/websockets/blob/main/LICENSE) * [**websockets**](https://github.com/aaugustin/websockets) - For downloading over websocket. Licensed under [BSD3](https://github.com/aaugustin/websockets/blob/main/LICENSE)
* [**secretstorage**](https://github.com/mitya57/secretstorage) - For accessing the Gnome keyring while decrypting cookies of Chromium-based browsers on Linux. Licensed under [BSD](https://github.com/mitya57/secretstorage/blob/master/LICENSE) * [**secretstorage**](https://github.com/mitya57/secretstorage) - For accessing the Gnome keyring while decrypting cookies of Chromium-based browsers on Linux. Licensed under [BSD](https://github.com/mitya57/secretstorage/blob/master/LICENSE)
* [**AtomicParsley**](https://github.com/wez/atomicparsley) - For embedding thumbnail in mp4/m4a if mutagen is not present. Licensed under [GPLv2+](https://github.com/wez/atomicparsley/blob/master/COPYING) * [**AtomicParsley**](https://github.com/wez/atomicparsley) - For embedding thumbnail in mp4/m4a if mutagen/ffmpeg cannot. Licensed under [GPLv2+](https://github.com/wez/atomicparsley/blob/master/COPYING)
* [**brotli**](https://github.com/google/brotli) or [**brotlicffi**](https://github.com/python-hyper/brotlicffi) - [Brotli](https://en.wikipedia.org/wiki/Brotli) content encoding support. Both licensed under MIT <sup>[1](https://github.com/google/brotli/blob/master/LICENSE) [2](https://github.com/python-hyper/brotlicffi/blob/master/LICENSE) </sup>
* [**rtmpdump**](http://rtmpdump.mplayerhq.hu) - For downloading `rtmp` streams. ffmpeg will be used as a fallback. Licensed under [GPLv2+](http://rtmpdump.mplayerhq.hu) * [**rtmpdump**](http://rtmpdump.mplayerhq.hu) - For downloading `rtmp` streams. ffmpeg will be used as a fallback. Licensed under [GPLv2+](http://rtmpdump.mplayerhq.hu)
* [**mplayer**](http://mplayerhq.hu/design7/info.html) or [**mpv**](https://mpv.io) - For downloading `rstp` streams. ffmpeg will be used as a fallback. Licensed under [GPLv2+](https://github.com/mpv-player/mpv/blob/master/Copyright) * [**mplayer**](http://mplayerhq.hu/design7/info.html) or [**mpv**](https://mpv.io) - For downloading `rstp` streams. ffmpeg will be used as a fallback. Licensed under [GPLv2+](https://github.com/mpv-player/mpv/blob/master/Copyright)
* [**phantomjs**](https://github.com/ariya/phantomjs) - Used in extractors where javascript needs to be run. Licensed under [BSD3](https://github.com/ariya/phantomjs/blob/master/LICENSE.BSD) * [**phantomjs**](https://github.com/ariya/phantomjs) - Used in extractors where javascript needs to be run. Licensed under [BSD3](https://github.com/ariya/phantomjs/blob/master/LICENSE.BSD)
@ -278,13 +279,14 @@ To use or redistribute the dependencies, you must agree to their respective lice
The Windows and MacOS standalone release binaries are already built with the python interpreter, mutagen, pycryptodomex and websockets included. The Windows and MacOS standalone release binaries are already built with the python interpreter, mutagen, pycryptodomex and websockets included.
<!-- TODO: ffmpeg has merged this patch. Remove this note once there is new release -->
**Note**: There are some regressions in newer ffmpeg versions that causes various issues when used alongside yt-dlp. Since ffmpeg is such an important dependency, we provide [custom builds](https://github.com/yt-dlp/FFmpeg-Builds#ffmpeg-static-auto-builds) with patches for these issues at [yt-dlp/FFmpeg-Builds](https://github.com/yt-dlp/FFmpeg-Builds). See [the readme](https://github.com/yt-dlp/FFmpeg-Builds#patches-applied) for details on the specific issues solved by these builds **Note**: There are some regressions in newer ffmpeg versions that causes various issues when used alongside yt-dlp. Since ffmpeg is such an important dependency, we provide [custom builds](https://github.com/yt-dlp/FFmpeg-Builds#ffmpeg-static-auto-builds) with patches for these issues at [yt-dlp/FFmpeg-Builds](https://github.com/yt-dlp/FFmpeg-Builds). See [the readme](https://github.com/yt-dlp/FFmpeg-Builds#patches-applied) for details on the specific issues solved by these builds
## COMPILE ## COMPILE
**For Windows**: **For Windows**:
To build the Windows executable, you must have pyinstaller (and optionally mutagen, pycryptodomex, websockets). Once you have all the necessary dependencies installed, (optionally) build lazy extractors using `devscripts/make_lazy_extractors.py`, and then just run `pyinst.py`. The executable will be built for the same architecture (32/64 bit) as the python used to build it. To build the Windows executable, you must have pyinstaller (and any of yt-dlp's optional dependencies if needed). Once you have all the necessary dependencies installed, (optionally) build lazy extractors using `devscripts/make_lazy_extractors.py`, and then just run `pyinst.py`. The executable will be built for the same architecture (32/64 bit) as the python used to build it.
py -m pip install -U pyinstaller -r requirements.txt py -m pip install -U pyinstaller -r requirements.txt
py devscripts/make_lazy_extractors.py py devscripts/make_lazy_extractors.py
@ -605,11 +607,11 @@ You can also fork the project on github and run your fork's [build workflow](.gi
--write-description etc. (default) --write-description etc. (default)
--no-write-playlist-metafiles Do not write playlist metadata when using --no-write-playlist-metafiles Do not write playlist metadata when using
--write-info-json, --write-description etc. --write-info-json, --write-description etc.
--clean-infojson Remove some private fields such as --clean-info-json Remove some private fields such as
filenames from the infojson. Note that it filenames from the infojson. Note that it
could still contain some personal could still contain some personal
information (default) information (default)
--no-clean-infojson Write all fields to the infojson --no-clean-info-json Write all fields to the infojson
--write-comments Retrieve video comments to be placed in the --write-comments Retrieve video comments to be placed in the
infojson. The comments are fetched even infojson. The comments are fetched even
without this option if the extraction is without this option if the extraction is
@ -1598,25 +1600,28 @@ This option also has a few special uses:
* You can download an additional URL based on the metadata of the currently downloaded video. To do this, set the field `additional_urls` to the URL that you want to download. Eg: `--parse-metadata "description:(?P<additional_urls>https?://www\.vimeo\.com/\d+)` will download the first vimeo video found in the description * You can download an additional URL based on the metadata of the currently downloaded video. To do this, set the field `additional_urls` to the URL that you want to download. Eg: `--parse-metadata "description:(?P<additional_urls>https?://www\.vimeo\.com/\d+)` will download the first vimeo video found in the description
* You can use this to change the metadata that is embedded in the media file. To do this, set the value of the corresponding field with a `meta_` prefix. For example, any value you set to `meta_description` field will be added to the `description` field in the file. For example, you can use this to set a different "description" and "synopsis". To modify the metadata of individual streams, use the `meta<n>_` prefix (Eg: `meta1_language`). Any value set to the `meta_` field will overwrite all default values. * You can use this to change the metadata that is embedded in the media file. To do this, set the value of the corresponding field with a `meta_` prefix. For example, any value you set to `meta_description` field will be added to the `description` field in the file. For example, you can use this to set a different "description" and "synopsis". To modify the metadata of individual streams, use the `meta<n>_` prefix (Eg: `meta1_language`). Any value set to the `meta_` field will overwrite all default values.
**Note**: Metadata modification happens before format selection, post-extraction and other post-processing operations. Some fields may be added or changed during these steps, overriding your changes.
For reference, these are the fields yt-dlp adds by default to the file metadata: For reference, these are the fields yt-dlp adds by default to the file metadata:
Metadata fields|From Metadata fields | From
:---|:--- :--------------------------|:------------------------------------------------
`title`|`track` or `title` `title` | `track` or `title`
`date`|`upload_date` `date` | `upload_date`
`description`, `synopsis`|`description` `description`, `synopsis` | `description`
`purl`, `comment`|`webpage_url` `purl`, `comment` | `webpage_url`
`track`|`track_number` `track` | `track_number`
`artist`|`artist`, `creator`, `uploader` or `uploader_id` `artist` | `artist`, `creator`, `uploader` or `uploader_id`
`genre`|`genre` `genre` | `genre`
`album`|`album` `album` | `album`
`album_artist`|`album_artist` `album_artist` | `album_artist`
`disc`|`disc_number` `disc` | `disc_number`
`show`|`series` `show` | `series`
`season_number`|`season_number` `season_number` | `season_number`
`episode_id`|`episode` or `episode_id` `episode_id` | `episode` or `episode_id`
`episode_sort`|`episode_number` `episode_sort` | `episode_number`
`language` of each stream|From the format's `language` `language` of each stream | the format's `language`
**Note**: The file format may not support some of these fields **Note**: The file format may not support some of these fields
@ -1815,12 +1820,11 @@ ydl_opts = {
}], }],
'logger': MyLogger(), 'logger': MyLogger(),
'progress_hooks': [my_hook], 'progress_hooks': [my_hook],
# Add custom headers
'http_headers': {'Referer': 'https://www.google.com'}
} }
# Add custom headers
yt_dlp.utils.std_headers.update({'Referer': 'https://www.google.com'})
# See the public functions in yt_dlp.YoutubeDL for for other available functions. # See the public functions in yt_dlp.YoutubeDL for for other available functions.
# Eg: "ydl.download", "ydl.download_with_info_file" # Eg: "ydl.download", "ydl.download_with_info_file"
with yt_dlp.YoutubeDL(ydl_opts) as ydl: with yt_dlp.YoutubeDL(ydl_opts) as ydl:

View File

@ -75,7 +75,11 @@ def filter_options(readme):
section = re.search(r'(?sm)^# USAGE AND OPTIONS\n.+?(?=^# )', readme).group(0) section = re.search(r'(?sm)^# USAGE AND OPTIONS\n.+?(?=^# )', readme).group(0)
options = '# OPTIONS\n' options = '# OPTIONS\n'
for line in section.split('\n')[1:]: for line in section.split('\n')[1:]:
mobj = re.fullmatch(r'\s{4}(?P<opt>-(?:,\s|[^\s])+)(?:\s(?P<meta>([^\s]|\s(?!\s))+))?(\s{2,}(?P<desc>.+))?', line) mobj = re.fullmatch(r'''(?x)
\s{4}(?P<opt>-(?:,\s|[^\s])+)
(?:\s(?P<meta>(?:[^\s]|\s(?!\s))+))?
(\s{2,}(?P<desc>.+))?
''', line)
if not mobj: if not mobj:
options += f'{line.lstrip()}\n' options += f'{line.lstrip()}\n'
continue continue

View File

@ -74,7 +74,7 @@ def version_to_list(version):
def dependency_options(): def dependency_options():
dependencies = [pycryptodome_module(), 'mutagen'] + collect_submodules('websockets') dependencies = [pycryptodome_module(), 'mutagen', 'brotli'] + collect_submodules('websockets')
excluded_modules = ['test', 'ytdlp_plugins', 'youtube-dl', 'youtube-dlc'] excluded_modules = ['test', 'ytdlp_plugins', 'youtube-dl', 'youtube-dlc']
yield from (f'--hidden-import={module}' for module in dependencies) yield from (f'--hidden-import={module}' for module in dependencies)

View File

@ -1,3 +1,5 @@
mutagen mutagen
pycryptodomex pycryptodomex
websockets websockets
brotli; platform_python_implementation=='CPython'
brotlicffi; platform_python_implementation!='CPython'

View File

@ -21,9 +21,9 @@ DESCRIPTION = 'A youtube-dl fork with additional features and patches'
LONG_DESCRIPTION = '\n\n'.join(( LONG_DESCRIPTION = '\n\n'.join((
'Official repository: <https://github.com/yt-dlp/yt-dlp>', 'Official repository: <https://github.com/yt-dlp/yt-dlp>',
'**PS**: Some links in this document will not work since this is a copy of the README.md from Github', '**PS**: Some links in this document will not work since this is a copy of the README.md from Github',
open('README.md', 'r', encoding='utf-8').read())) open('README.md').read()))
REQUIREMENTS = ['mutagen', 'pycryptodomex', 'websockets'] REQUIREMENTS = open('requirements.txt').read().splitlines()
if sys.argv[1:2] == ['py2exe']: if sys.argv[1:2] == ['py2exe']:

View File

@ -32,6 +32,7 @@ from string import ascii_letters
from .compat import ( from .compat import (
compat_basestring, compat_basestring,
compat_brotli,
compat_get_terminal_size, compat_get_terminal_size,
compat_kwargs, compat_kwargs,
compat_numeric_types, compat_numeric_types,
@ -234,6 +235,8 @@ class YoutubeDL(object):
See "Sorting Formats" for more details. See "Sorting Formats" for more details.
format_sort_force: Force the given format_sort. see "Sorting Formats" format_sort_force: Force the given format_sort. see "Sorting Formats"
for more details. for more details.
prefer_free_formats: Whether to prefer video formats with free containers
over non-free ones of same quality.
allow_multiple_video_streams: Allow multiple video streams to be merged allow_multiple_video_streams: Allow multiple video streams to be merged
into a single file into a single file
allow_multiple_audio_streams: Allow multiple audio streams to be merged allow_multiple_audio_streams: Allow multiple audio streams to be merged
@ -3675,6 +3678,7 @@ class YoutubeDL(object):
from .cookies import SQLITE_AVAILABLE, SECRETSTORAGE_AVAILABLE from .cookies import SQLITE_AVAILABLE, SECRETSTORAGE_AVAILABLE
lib_str = join_nonempty( lib_str = join_nonempty(
compat_brotli and compat_brotli.__name__,
compat_pycrypto_AES and compat_pycrypto_AES.__name__.split('.')[0], compat_pycrypto_AES and compat_pycrypto_AES.__name__.split('.')[0],
SECRETSTORAGE_AVAILABLE and 'secretstorage', SECRETSTORAGE_AVAILABLE and 'secretstorage',
has_mutagen and 'mutagen', has_mutagen and 'mutagen',

View File

@ -170,6 +170,13 @@ except ImportError:
except ImportError: except ImportError:
compat_pycrypto_AES = None compat_pycrypto_AES = None
try:
import brotlicffi as compat_brotli
except ImportError:
try:
import brotli as compat_brotli
except ImportError:
compat_brotli = None
WINDOWS_VT_MODE = False if compat_os_name == 'nt' else None WINDOWS_VT_MODE = False if compat_os_name == 'nt' else None
@ -258,6 +265,7 @@ __all__ = [
'compat_asyncio_run', 'compat_asyncio_run',
'compat_b64decode', 'compat_b64decode',
'compat_basestring', 'compat_basestring',
'compat_brotli',
'compat_chr', 'compat_chr',
'compat_collections_abc', 'compat_collections_abc',
'compat_cookiejar', 'compat_cookiejar',

View File

@ -22,6 +22,9 @@ class YoutubeLiveChatFD(FragmentFD):
def real_download(self, filename, info_dict): def real_download(self, filename, info_dict):
video_id = info_dict['video_id'] video_id = info_dict['video_id']
self.to_screen('[%s] Downloading live chat' % self.FD_NAME) self.to_screen('[%s] Downloading live chat' % self.FD_NAME)
if not self.params.get('skip_download'):
self.report_warning('Live chat download runs until the livestream ends. '
'If you wish to download the video simultaneously, run a separate yt-dlp instance')
fragment_retries = self.params.get('fragment_retries', 0) fragment_retries = self.params.get('fragment_retries', 0)
test = self.params.get('test', False) test = self.params.get('test', False)

View File

@ -8,10 +8,6 @@ import struct
from base64 import urlsafe_b64encode from base64 import urlsafe_b64encode
from binascii import unhexlify from binascii import unhexlify
import typing
if typing.TYPE_CHECKING:
from ..YoutubeDL import YoutubeDL
from .common import InfoExtractor from .common import InfoExtractor
from ..aes import aes_ecb_decrypt from ..aes import aes_ecb_decrypt
from ..compat import ( from ..compat import (
@ -36,15 +32,15 @@ from ..utils import (
# NOTE: network handler related code is temporary thing until network stack overhaul PRs are merged (#2861/#2862) # NOTE: network handler related code is temporary thing until network stack overhaul PRs are merged (#2861/#2862)
def add_opener(self: 'YoutubeDL', handler): def add_opener(ydl, handler):
''' Add a handler for opening URLs, like _download_webpage ''' ''' Add a handler for opening URLs, like _download_webpage '''
# https://github.com/python/cpython/blob/main/Lib/urllib/request.py#L426 # https://github.com/python/cpython/blob/main/Lib/urllib/request.py#L426
# https://github.com/python/cpython/blob/main/Lib/urllib/request.py#L605 # https://github.com/python/cpython/blob/main/Lib/urllib/request.py#L605
assert isinstance(self._opener, compat_urllib_request.OpenerDirector) assert isinstance(ydl._opener, compat_urllib_request.OpenerDirector)
self._opener.add_handler(handler) ydl._opener.add_handler(handler)
def remove_opener(self: 'YoutubeDL', handler): def remove_opener(ydl, handler):
''' '''
Remove handler(s) for opening URLs Remove handler(s) for opening URLs
@param handler Either handler object itself or handler type. @param handler Either handler object itself or handler type.
@ -52,8 +48,8 @@ def remove_opener(self: 'YoutubeDL', handler):
''' '''
# https://github.com/python/cpython/blob/main/Lib/urllib/request.py#L426 # https://github.com/python/cpython/blob/main/Lib/urllib/request.py#L426
# https://github.com/python/cpython/blob/main/Lib/urllib/request.py#L605 # https://github.com/python/cpython/blob/main/Lib/urllib/request.py#L605
opener = self._opener opener = ydl._opener
assert isinstance(self._opener, compat_urllib_request.OpenerDirector) assert isinstance(ydl._opener, compat_urllib_request.OpenerDirector)
if isinstance(handler, (type, tuple)): if isinstance(handler, (type, tuple)):
find_cp = lambda x: isinstance(x, handler) find_cp = lambda x: isinstance(x, handler)
else: else:

View File

@ -1345,6 +1345,11 @@ MSO_INFO = {
'username_field': 'username', 'username_field': 'username',
'password_field': 'password', 'password_field': 'password',
}, },
'Suddenlink': {
'name': 'Suddenlink',
'username_field': 'username',
'password_field': 'password',
},
} }
@ -1635,6 +1640,52 @@ class AdobePassIE(InfoExtractor):
urlh.geturl(), video_id, 'Sending final bookend', urlh.geturl(), video_id, 'Sending final bookend',
query=hidden_data) query=hidden_data)
post_form(mvpd_confirm_page_res, 'Confirming Login')
elif mso_id == 'Suddenlink':
# Suddenlink is similar to SlingTV in using a tab history count and a meta refresh,
# but they also do a dynmaic redirect using javascript that has to be followed as well
first_bookend_page, urlh = post_form(
provider_redirect_page_res, 'Pressing Continue...')
hidden_data = self._hidden_inputs(first_bookend_page)
hidden_data['history_val'] = 1
provider_login_redirect_page = self._download_webpage(
urlh.geturl(), video_id, 'Sending First Bookend',
query=hidden_data)
provider_tryauth_url = self._html_search_regex(
r'url:\s*[\'"]([^\'"]+)', provider_login_redirect_page, 'ajaxurl')
provider_tryauth_page = self._download_webpage(
provider_tryauth_url, video_id, 'Submitting TryAuth',
query=hidden_data)
provider_login_page_res = self._download_webpage_handle(
f'https://authorize.suddenlink.net/saml/module.php/authSynacor/login.php?AuthState={provider_tryauth_page}',
video_id, 'Getting Login Page',
query=hidden_data)
provider_association_redirect, urlh = post_form(
provider_login_page_res, 'Logging in', {
mso_info['username_field']: username,
mso_info['password_field']: password
})
provider_refresh_redirect_url = extract_redirect_url(
provider_association_redirect, url=urlh.geturl())
last_bookend_page, urlh = self._download_webpage_handle(
provider_refresh_redirect_url, video_id,
'Downloading Auth Association Redirect Page')
hidden_data = self._hidden_inputs(last_bookend_page)
hidden_data['history_val'] = 3
mvpd_confirm_page_res = self._download_webpage_handle(
urlh.geturl(), video_id, 'Sending Final Bookend',
query=hidden_data)
post_form(mvpd_confirm_page_res, 'Confirming Login') post_form(mvpd_confirm_page_res, 'Confirming Login')
else: else:
# Some providers (e.g. DIRECTV NOW) have another meta refresh # Some providers (e.g. DIRECTV NOW) have another meta refresh

View File

@ -97,8 +97,8 @@ class Ant1NewsGrArticleIE(Ant1NewsGrBaseIE):
embed_urls = list(Ant1NewsGrEmbedIE._extract_urls(webpage)) embed_urls = list(Ant1NewsGrEmbedIE._extract_urls(webpage))
if not embed_urls: if not embed_urls:
raise ExtractorError('no videos found for %s' % video_id, expected=True) raise ExtractorError('no videos found for %s' % video_id, expected=True)
return self.url_result_or_playlist_from_matches( return self.playlist_from_matches(
embed_urls, video_id, info['title'], ie=Ant1NewsGrEmbedIE.ie_key(), embed_urls, video_id, info.get('title'), ie=Ant1NewsGrEmbedIE.ie_key(),
video_kwargs={'url_transparent': True, 'timestamp': info.get('timestamp')}) video_kwargs={'url_transparent': True, 'timestamp': info.get('timestamp')})

View File

@ -1,17 +1,14 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import calendar
import datetime
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
clean_html, clean_html,
extract_timezone,
int_or_none, int_or_none,
parse_duration, parse_duration,
parse_resolution, parse_resolution,
try_get, try_get,
unified_timestamp,
url_or_none, url_or_none,
) )
@ -95,14 +92,8 @@ class CCMAIE(InfoExtractor):
duration = int_or_none(durada.get('milisegons'), 1000) or parse_duration(durada.get('text')) duration = int_or_none(durada.get('milisegons'), 1000) or parse_duration(durada.get('text'))
tematica = try_get(informacio, lambda x: x['tematica']['text']) tematica = try_get(informacio, lambda x: x['tematica']['text'])
timestamp = None
data_utc = try_get(informacio, lambda x: x['data_emissio']['utc']) data_utc = try_get(informacio, lambda x: x['data_emissio']['utc'])
try: timestamp = unified_timestamp(data_utc)
timezone, data_utc = extract_timezone(data_utc)
timestamp = calendar.timegm((datetime.datetime.strptime(
data_utc, '%Y-%d-%mT%H:%M:%S') - timezone).timetuple())
except TypeError:
pass
subtitles = {} subtitles = {}
subtitols = media.get('subtitols') or [] subtitols = media.get('subtitols') or []

View File

@ -226,6 +226,7 @@ class InfoExtractor(object):
The following fields are optional: The following fields are optional:
direct: True if a direct video file was given (must only be set by GenericIE)
alt_title: A secondary title of the video. alt_title: A secondary title of the video.
display_id An alternative identifier for the video, not necessarily display_id An alternative identifier for the video, not necessarily
unique, but available before title. Typically, id is unique, but available before title. Typically, id is
@ -274,7 +275,7 @@ class InfoExtractor(object):
* "url": A URL pointing to the subtitles file * "url": A URL pointing to the subtitles file
It can optionally also have: It can optionally also have:
* "name": Name or description of the subtitles * "name": Name or description of the subtitles
* http_headers: A dictionary of additional HTTP headers * "http_headers": A dictionary of additional HTTP headers
to add to the request. to add to the request.
"ext" will be calculated from URL if missing "ext" will be calculated from URL if missing
automatic_captions: Like 'subtitles'; contains automatically generated automatic_captions: Like 'subtitles'; contains automatically generated
@ -425,8 +426,8 @@ class InfoExtractor(object):
title, description etc. title, description etc.
Subclasses of this one should re-define the _real_initialize() and Subclasses of this should define a _VALID_URL regexp and, re-define the
_real_extract() methods and define a _VALID_URL regexp. _real_extract() and (optionally) _real_initialize() methods.
Probably, they should also be added to the list of extractors. Probably, they should also be added to the list of extractors.
Subclasses may also override suitable() if necessary, but ensure the function Subclasses may also override suitable() if necessary, but ensure the function
@ -661,7 +662,7 @@ class InfoExtractor(object):
return False return False
def set_downloader(self, downloader): def set_downloader(self, downloader):
"""Sets the downloader for this IE.""" """Sets a YoutubeDL instance as the downloader for this IE."""
self._downloader = downloader self._downloader = downloader
def _real_initialize(self): def _real_initialize(self):
@ -670,7 +671,7 @@ class InfoExtractor(object):
def _real_extract(self, url): def _real_extract(self, url):
"""Real extraction process. Redefine in subclasses.""" """Real extraction process. Redefine in subclasses."""
pass raise NotImplementedError('This method must be implemented by subclasses')
@classmethod @classmethod
def ie_key(cls): def ie_key(cls):
@ -1661,31 +1662,31 @@ class InfoExtractor(object):
'format_id': {'type': 'alias', 'field': 'id'}, 'format_id': {'type': 'alias', 'field': 'id'},
'preference': {'type': 'alias', 'field': 'ie_pref'}, 'preference': {'type': 'alias', 'field': 'ie_pref'},
'language_preference': {'type': 'alias', 'field': 'lang'}, 'language_preference': {'type': 'alias', 'field': 'lang'},
'source_preference': {'type': 'alias', 'field': 'source'},
'protocol': {'type': 'alias', 'field': 'proto'},
'filesize_approx': {'type': 'alias', 'field': 'fs_approx'},
# Deprecated # Deprecated
'dimension': {'type': 'alias', 'field': 'res'}, 'dimension': {'type': 'alias', 'field': 'res', 'deprecated': True},
'resolution': {'type': 'alias', 'field': 'res'}, 'resolution': {'type': 'alias', 'field': 'res', 'deprecated': True},
'extension': {'type': 'alias', 'field': 'ext'}, 'extension': {'type': 'alias', 'field': 'ext', 'deprecated': True},
'bitrate': {'type': 'alias', 'field': 'br'}, 'bitrate': {'type': 'alias', 'field': 'br', 'deprecated': True},
'total_bitrate': {'type': 'alias', 'field': 'tbr'}, 'total_bitrate': {'type': 'alias', 'field': 'tbr', 'deprecated': True},
'video_bitrate': {'type': 'alias', 'field': 'vbr'}, 'video_bitrate': {'type': 'alias', 'field': 'vbr', 'deprecated': True},
'audio_bitrate': {'type': 'alias', 'field': 'abr'}, 'audio_bitrate': {'type': 'alias', 'field': 'abr', 'deprecated': True},
'framerate': {'type': 'alias', 'field': 'fps'}, 'framerate': {'type': 'alias', 'field': 'fps', 'deprecated': True},
'protocol': {'type': 'alias', 'field': 'proto'}, 'filesize_estimate': {'type': 'alias', 'field': 'size', 'deprecated': True},
'source_preference': {'type': 'alias', 'field': 'source'}, 'samplerate': {'type': 'alias', 'field': 'asr', 'deprecated': True},
'filesize_approx': {'type': 'alias', 'field': 'fs_approx'}, 'video_ext': {'type': 'alias', 'field': 'vext', 'deprecated': True},
'filesize_estimate': {'type': 'alias', 'field': 'size'}, 'audio_ext': {'type': 'alias', 'field': 'aext', 'deprecated': True},
'samplerate': {'type': 'alias', 'field': 'asr'}, 'video_codec': {'type': 'alias', 'field': 'vcodec', 'deprecated': True},
'video_ext': {'type': 'alias', 'field': 'vext'}, 'audio_codec': {'type': 'alias', 'field': 'acodec', 'deprecated': True},
'audio_ext': {'type': 'alias', 'field': 'aext'}, 'video': {'type': 'alias', 'field': 'hasvid', 'deprecated': True},
'video_codec': {'type': 'alias', 'field': 'vcodec'}, 'has_video': {'type': 'alias', 'field': 'hasvid', 'deprecated': True},
'audio_codec': {'type': 'alias', 'field': 'acodec'}, 'audio': {'type': 'alias', 'field': 'hasaud', 'deprecated': True},
'video': {'type': 'alias', 'field': 'hasvid'}, 'has_audio': {'type': 'alias', 'field': 'hasaud', 'deprecated': True},
'has_video': {'type': 'alias', 'field': 'hasvid'}, 'extractor': {'type': 'alias', 'field': 'ie_pref', 'deprecated': True},
'audio': {'type': 'alias', 'field': 'hasaud'}, 'extractor_preference': {'type': 'alias', 'field': 'ie_pref', 'deprecated': True},
'has_audio': {'type': 'alias', 'field': 'hasaud'},
'extractor': {'type': 'alias', 'field': 'ie_pref'},
'extractor_preference': {'type': 'alias', 'field': 'ie_pref'},
} }
def __init__(self, ie, field_preference): def __init__(self, ie, field_preference):
@ -1785,7 +1786,7 @@ class InfoExtractor(object):
continue continue
if self._get_field_setting(field, 'type') == 'alias': if self._get_field_setting(field, 'type') == 'alias':
alias, field = field, self._get_field_setting(field, 'field') alias, field = field, self._get_field_setting(field, 'field')
if alias not in ('format_id', 'preference', 'language_preference'): if self._get_field_setting(alias, 'deprecated'):
self.ydl.deprecation_warning( self.ydl.deprecation_warning(
f'Format sorting alias {alias} is deprecated ' f'Format sorting alias {alias} is deprecated '
f'and may be removed in a future version. Please use {field} instead') f'and may be removed in a future version. Please use {field} instead')

View File

@ -520,6 +520,7 @@ from .foxnews import (
FoxNewsArticleIE, FoxNewsArticleIE,
) )
from .foxsports import FoxSportsIE from .foxsports import FoxSportsIE
from .fptplay import FptplayIE
from .franceculture import FranceCultureIE from .franceculture import FranceCultureIE
from .franceinter import FranceInterIE from .franceinter import FranceInterIE
from .francetv import ( from .francetv import (
@ -848,6 +849,7 @@ from .microsoftvirtualacademy import (
from .mildom import ( from .mildom import (
MildomIE, MildomIE,
MildomVodIE, MildomVodIE,
MildomClipIE,
MildomUserVodIE, MildomUserVodIE,
) )
from .minds import ( from .minds import (
@ -2010,6 +2012,7 @@ from .ximalaya import (
XimalayaIE, XimalayaIE,
XimalayaAlbumIE XimalayaAlbumIE
) )
from .xinpianchang import XinpianchangIE
from .xminus import XMinusIE from .xminus import XMinusIE
from .xnxx import XNXXIE from .xnxx import XNXXIE
from .xstream import XstreamIE from .xstream import XstreamIE

102
yt_dlp/extractor/fptplay.py Normal file
View File

@ -0,0 +1,102 @@
# coding: utf-8
from __future__ import unicode_literals
import hashlib
import time
import urllib.parse
from .common import InfoExtractor
from ..utils import (
join_nonempty,
)
class FptplayIE(InfoExtractor):
_VALID_URL = r'https?://fptplay\.vn/(?P<type>xem-video)/[^/]+\-(?P<id>\w+)(?:/tap-(?P<episode>[^/]+)?/?(?:[?#]|$)|)'
_GEO_COUNTRIES = ['VN']
IE_NAME = 'fptplay'
IE_DESC = 'fptplay.vn'
_TESTS = [{
'url': 'https://fptplay.vn/xem-video/nhan-duyen-dai-nhan-xin-dung-buoc-621a123016f369ebbde55945',
'md5': 'ca0ee9bc63446c0c3e9a90186f7d6b33',
'info_dict': {
'id': '621a123016f369ebbde55945',
'ext': 'mp4',
'title': 'Nhân Duyên Đại Nhân Xin Dừng Bước - Ms. Cupid In Love',
'description': 'md5:23cf7d1ce0ade8e21e76ae482e6a8c6c',
},
}, {
'url': 'https://fptplay.vn/xem-video/ma-toi-la-dai-gia-61f3aa8a6b3b1d2e73c60eb5/tap-3',
'md5': 'b35be968c909b3e4e1e20ca45dd261b1',
'info_dict': {
'id': '61f3aa8a6b3b1d2e73c60eb5',
'ext': 'mp4',
'title': 'Má Tôi Là Đại Gia - 3',
'description': 'md5:ff8ba62fb6e98ef8875c42edff641d1c',
},
}, {
'url': 'https://fptplay.vn/xem-video/nha-co-chuyen-hi-alls-well-ends-well-1997-6218995f6af792ee370459f0',
'only_matching': True,
}]
def _real_extract(self, url):
type_url, video_id, episode = self._match_valid_url(url).group('type', 'id', 'episode')
webpage = self._download_webpage(url, video_id=video_id, fatal=False)
info = self._download_json(self.get_api_with_st_token(video_id, episode or 0), video_id)
formats, subtitles = self._extract_m3u8_formats_and_subtitles(info['data']['url'], video_id, 'mp4')
self._sort_formats(formats)
return {
'id': video_id,
'title': join_nonempty(
self._html_search_meta(('og:title', 'twitter:title'), webpage), episode, delim=' - '),
'description': self._html_search_meta(['og:description', 'twitter:description'], webpage),
'formats': formats,
'subtitles': subtitles,
}
def get_api_with_st_token(self, video_id, episode):
path = f'/api/v6.2_w/stream/vod/{video_id}/{episode}/auto_vip'
timestamp = int(time.time()) + 10800
t = hashlib.md5(f'WEBv6Dkdsad90dasdjlALDDDS{timestamp}{path}'.encode()).hexdigest().upper()
r = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/'
n = [int(f'0x{t[2 * o: 2 * o + 2]}', 16) for o in range(len(t) // 2)]
def convert(e):
t = ''
n = 0
i = [0, 0, 0]
a = [0, 0, 0, 0]
s = len(e)
c = 0
for z in range(s, 0, -1):
if n <= 3:
i[n] = e[c]
n += 1
c += 1
if 3 == n:
a[0] = (252 & i[0]) >> 2
a[1] = ((3 & i[0]) << 4) + ((240 & i[1]) >> 4)
a[2] = ((15 & i[1]) << 2) + ((192 & i[2]) >> 6)
a[3] = (63 & i[2])
for v in range(4):
t += r[a[v]]
n = 0
if n:
for o in range(n, 3):
i[o] = 0
for o in range(n + 1):
a[0] = (252 & i[0]) >> 2
a[1] = ((3 & i[0]) << 4) + ((240 & i[1]) >> 4)
a[2] = ((15 & i[1]) << 2) + ((192 & i[2]) >> 6)
a[3] = (63 & i[2])
t += r[a[o]]
n += 1
while n < 3:
t += ''
n += 1
return t
st_token = convert(n).replace('+', '-').replace('/', '_').replace('=', '')
return f'https://api.fptplay.net{path}?{urllib.parse.urlencode({"st": st_token, "e": timestamp})}'

View File

@ -252,9 +252,9 @@ class FrontendMastersCourseIE(FrontendMastersPageBaseIE):
entries = [] entries = []
for lesson in lessons: for lesson in lessons:
lesson_name = lesson.get('slug') lesson_name = lesson.get('slug')
if not lesson_name:
continue
lesson_id = lesson.get('hash') or lesson.get('statsId') lesson_id = lesson.get('hash') or lesson.get('statsId')
if not lesson_id or not lesson_name:
continue
entries.append(self._extract_lesson(chapters, lesson_id, lesson)) entries.append(self._extract_lesson(chapters, lesson_id, lesson))
title = course.get('title') title = course.get('title')

View File

@ -621,7 +621,7 @@ class IqIE(InfoExtractor):
preview_time = traverse_obj( preview_time = traverse_obj(
initial_format_data, ('boss_ts', (None, 'data'), ('previewTime', 'rtime')), expected_type=float_or_none, get_all=False) initial_format_data, ('boss_ts', (None, 'data'), ('previewTime', 'rtime')), expected_type=float_or_none, get_all=False)
if traverse_obj(initial_format_data, ('boss_ts', 'data', 'prv'), expected_type=int_or_none): if traverse_obj(initial_format_data, ('boss_ts', 'data', 'prv'), expected_type=int_or_none):
self.report_warning('This preview video is limited%s' % format_field(preview_time, template='to %s seconds')) self.report_warning('This preview video is limited%s' % format_field(preview_time, template=' to %s seconds'))
# TODO: Extract audio-only formats # TODO: Extract audio-only formats
for bid in set(traverse_obj(initial_format_data, ('program', 'video', ..., 'bid'), expected_type=str_or_none, default=[])): for bid in set(traverse_obj(initial_format_data, ('program', 'video', ..., 'bid'), expected_type=str_or_none, default=[])):

View File

@ -1,102 +1,42 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import base64 import functools
from datetime import datetime
import itertools
import json import json
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
update_url_query, determine_ext,
random_uuidv4, dict_get,
try_get, ExtractorError,
float_or_none, float_or_none,
dict_get OnDemandPagedList,
) random_uuidv4,
from ..compat import ( traverse_obj,
compat_str,
) )
class MildomBaseIE(InfoExtractor): class MildomBaseIE(InfoExtractor):
_GUEST_ID = None _GUEST_ID = None
_DISPATCHER_CONFIG = None
def _call_api(self, url, video_id, query=None, note='Downloading JSON metadata', init=False): def _call_api(self, url, video_id, query=None, note='Downloading JSON metadata', body=None):
query = query or {} if not self._GUEST_ID:
if query: self._GUEST_ID = f'pc-gp-{random_uuidv4()}'
query['__platform'] = 'web'
url = update_url_query(url, self._common_queries(query, init=init)) content = self._download_json(
content = self._download_json(url, video_id, note=note) url, video_id, note=note, data=json.dumps(body).encode() if body else None,
if content['code'] == 0: headers={'Content-Type': 'application/json'} if body else {},
return content['body'] query={
else: '__guest_id': self._GUEST_ID,
self.raise_no_formats( '__platform': 'web',
f'Video not found or premium content. {content["code"]} - {content["message"]}', **(query or {}),
})
if content['code'] != 0:
raise ExtractorError(
f'Mildom says: {content["message"]} (code {content["code"]})',
expected=True) expected=True)
return content['body']
def _common_queries(self, query={}, init=False):
dc = self._fetch_dispatcher_config()
r = {
'timestamp': self.iso_timestamp(),
'__guest_id': '' if init else self.guest_id(),
'__location': dc['location'],
'__country': dc['country'],
'__cluster': dc['cluster'],
'__platform': 'web',
'__la': self.lang_code(),
'__pcv': 'v2.9.44',
'sfr': 'pc',
'accessToken': '',
}
r.update(query)
return r
def _fetch_dispatcher_config(self):
if not self._DISPATCHER_CONFIG:
tmp = self._download_json(
'https://disp.mildom.com/serverListV2', 'initialization',
note='Downloading dispatcher_config', data=json.dumps({
'protover': 0,
'data': base64.b64encode(json.dumps({
'fr': 'web',
'sfr': 'pc',
'devi': 'Windows',
'la': 'ja',
'gid': None,
'loc': '',
'clu': '',
'wh': '1919*810',
'rtm': self.iso_timestamp(),
'ua': self.get_param('http_headers')['User-Agent'],
}).encode('utf8')).decode('utf8').replace('\n', ''),
}).encode('utf8'))
self._DISPATCHER_CONFIG = self._parse_json(base64.b64decode(tmp['data']), 'initialization')
return self._DISPATCHER_CONFIG
@staticmethod
def iso_timestamp():
'new Date().toISOString()'
return datetime.utcnow().isoformat()[0:-3] + 'Z'
def guest_id(self):
'getGuestId'
if self._GUEST_ID:
return self._GUEST_ID
self._GUEST_ID = try_get(
self, (
lambda x: x._call_api(
'https://cloudac.mildom.com/nonolive/gappserv/guest/h5init', 'initialization',
note='Downloading guest token', init=True)['guest_id'] or None,
lambda x: x._get_cookies('https://www.mildom.com').get('gid').value,
lambda x: x._get_cookies('https://m.mildom.com').get('gid').value,
), compat_str) or ''
return self._GUEST_ID
def lang_code(self):
'getCurrentLangCode'
return 'ja'
class MildomIE(MildomBaseIE): class MildomIE(MildomBaseIE):
@ -106,31 +46,13 @@ class MildomIE(MildomBaseIE):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
url = 'https://www.mildom.com/%s' % video_id webpage = self._download_webpage(f'https://www.mildom.com/{video_id}', video_id)
webpage = self._download_webpage(url, video_id)
enterstudio = self._call_api( enterstudio = self._call_api(
'https://cloudac.mildom.com/nonolive/gappserv/live/enterstudio', video_id, 'https://cloudac.mildom.com/nonolive/gappserv/live/enterstudio', video_id,
note='Downloading live metadata', query={'user_id': video_id}) note='Downloading live metadata', query={'user_id': video_id})
result_video_id = enterstudio.get('log_id', video_id) result_video_id = enterstudio.get('log_id', video_id)
title = try_get(
enterstudio, (
lambda x: self._html_search_meta('twitter:description', webpage),
lambda x: x['anchor_intro'],
), compat_str)
description = try_get(
enterstudio, (
lambda x: x['intro'],
lambda x: x['live_intro'],
), compat_str)
uploader = try_get(
enterstudio, (
lambda x: self._html_search_meta('twitter:title', webpage),
lambda x: x['loginname'],
), compat_str)
servers = self._call_api( servers = self._call_api(
'https://cloudac.mildom.com/nonolive/gappserv/live/liveserver', result_video_id, 'https://cloudac.mildom.com/nonolive/gappserv/live/liveserver', result_video_id,
note='Downloading live server list', query={ note='Downloading live server list', query={
@ -138,17 +60,20 @@ class MildomIE(MildomBaseIE):
'live_server_type': 'hls', 'live_server_type': 'hls',
}) })
stream_query = self._common_queries({ playback_token = self._call_api(
'streamReqId': random_uuidv4(), 'https://cloudac.mildom.com/nonolive/gappserv/live/token', result_video_id,
'is_lhls': '0', note='Obtaining live playback token', body={'host_id': video_id, 'type': 'hls'})
}) playback_token = traverse_obj(playback_token, ('data', ..., 'token'), get_all=False)
m3u8_url = update_url_query(servers['stream_server'] + '/%s_master.m3u8' % video_id, stream_query) if not playback_token:
formats = self._extract_m3u8_formats(m3u8_url, result_video_id, 'mp4', headers={ raise ExtractorError('Failed to obtain live playback token')
'Referer': 'https://www.mildom.com/',
'Origin': 'https://www.mildom.com', formats = self._extract_m3u8_formats(
}, note='Downloading m3u8 information') f'{servers["stream_server"]}/{video_id}_master.m3u8?{playback_token}',
result_video_id, 'mp4', headers={
'Referer': 'https://www.mildom.com/',
'Origin': 'https://www.mildom.com',
})
del stream_query['streamReqId'], stream_query['timestamp']
for fmt in formats: for fmt in formats:
fmt.setdefault('http_headers', {})['Referer'] = 'https://www.mildom.com/' fmt.setdefault('http_headers', {})['Referer'] = 'https://www.mildom.com/'
@ -156,10 +81,10 @@ class MildomIE(MildomBaseIE):
return { return {
'id': result_video_id, 'id': result_video_id,
'title': title, 'title': self._html_search_meta('twitter:description', webpage, default=None) or traverse_obj(enterstudio, 'anchor_intro'),
'description': description, 'description': traverse_obj(enterstudio, 'intro', 'live_intro', expected_type=str),
'timestamp': float_or_none(enterstudio.get('live_start_ms'), scale=1000), 'timestamp': float_or_none(enterstudio.get('live_start_ms'), scale=1000),
'uploader': uploader, 'uploader': self._html_search_meta('twitter:title', webpage, default=None) or traverse_obj(enterstudio, 'loginname'),
'uploader_id': video_id, 'uploader_id': video_id,
'formats': formats, 'formats': formats,
'is_live': True, 'is_live': True,
@ -168,7 +93,7 @@ class MildomIE(MildomBaseIE):
class MildomVodIE(MildomBaseIE): class MildomVodIE(MildomBaseIE):
IE_NAME = 'mildom:vod' IE_NAME = 'mildom:vod'
IE_DESC = 'Download a VOD in Mildom' IE_DESC = 'VOD in Mildom'
_VALID_URL = r'https?://(?:(?:www|m)\.)mildom\.com/playback/(?P<user_id>\d+)/(?P<id>(?P=user_id)-[a-zA-Z0-9]+-?[0-9]*)' _VALID_URL = r'https?://(?:(?:www|m)\.)mildom\.com/playback/(?P<user_id>\d+)/(?P<id>(?P=user_id)-[a-zA-Z0-9]+-?[0-9]*)'
_TESTS = [{ _TESTS = [{
'url': 'https://www.mildom.com/playback/10882672/10882672-1597662269', 'url': 'https://www.mildom.com/playback/10882672/10882672-1597662269',
@ -215,11 +140,8 @@ class MildomVodIE(MildomBaseIE):
}] }]
def _real_extract(self, url): def _real_extract(self, url):
m = self._match_valid_url(url) user_id, video_id = self._match_valid_url(url).group('user_id', 'id')
user_id, video_id = m.group('user_id'), m.group('id') webpage = self._download_webpage(f'https://www.mildom.com/playback/{user_id}/{video_id}', video_id)
url = 'https://www.mildom.com/playback/%s/%s' % (user_id, video_id)
webpage = self._download_webpage(url, video_id)
autoplay = self._call_api( autoplay = self._call_api(
'https://cloudac.mildom.com/nonolive/videocontent/playback/getPlaybackDetail', video_id, 'https://cloudac.mildom.com/nonolive/videocontent/playback/getPlaybackDetail', video_id,
@ -227,20 +149,6 @@ class MildomVodIE(MildomBaseIE):
'v_id': video_id, 'v_id': video_id,
})['playback'] })['playback']
title = try_get(
autoplay, (
lambda x: self._html_search_meta('og:description', webpage),
lambda x: x['title'],
), compat_str)
description = try_get(
autoplay, (
lambda x: x['video_intro'],
), compat_str)
uploader = try_get(
autoplay, (
lambda x: x['author_info']['login_name'],
), compat_str)
formats = [{ formats = [{
'url': autoplay['audio_url'], 'url': autoplay['audio_url'],
'format_id': 'audio', 'format_id': 'audio',
@ -265,17 +173,81 @@ class MildomVodIE(MildomBaseIE):
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': self._html_search_meta(('og:description', 'description'), webpage, default=None) or autoplay.get('title'),
'description': description, 'description': traverse_obj(autoplay, 'video_intro'),
'timestamp': float_or_none(autoplay['publish_time'], scale=1000), 'timestamp': float_or_none(autoplay.get('publish_time'), scale=1000),
'duration': float_or_none(autoplay['video_length'], scale=1000), 'duration': float_or_none(autoplay.get('video_length'), scale=1000),
'thumbnail': dict_get(autoplay, ('upload_pic', 'video_pic')), 'thumbnail': dict_get(autoplay, ('upload_pic', 'video_pic')),
'uploader': uploader, 'uploader': traverse_obj(autoplay, ('author_info', 'login_name')),
'uploader_id': user_id, 'uploader_id': user_id,
'formats': formats, 'formats': formats,
} }
class MildomClipIE(MildomBaseIE):
IE_NAME = 'mildom:clip'
IE_DESC = 'Clip in Mildom'
_VALID_URL = r'https?://(?:(?:www|m)\.)mildom\.com/clip/(?P<id>(?P<user_id>\d+)-[a-zA-Z0-9]+)'
_TESTS = [{
'url': 'https://www.mildom.com/clip/10042245-63921673e7b147ebb0806d42b5ba5ce9',
'info_dict': {
'id': '10042245-63921673e7b147ebb0806d42b5ba5ce9',
'title': '全然違ったよ',
'timestamp': 1619181890,
'duration': 59,
'thumbnail': r're:https?://.+',
'uploader': 'ざきんぽ',
'uploader_id': '10042245',
},
}, {
'url': 'https://www.mildom.com/clip/10111524-ebf4036e5aa8411c99fb3a1ae0902864',
'info_dict': {
'id': '10111524-ebf4036e5aa8411c99fb3a1ae0902864',
'title': 'かっこいい',
'timestamp': 1621094003,
'duration': 59,
'thumbnail': r're:https?://.+',
'uploader': '(ルーキー',
'uploader_id': '10111524',
},
}, {
'url': 'https://www.mildom.com/clip/10660174-2c539e6e277c4aaeb4b1fbe8d22cb902',
'info_dict': {
'id': '10660174-2c539e6e277c4aaeb4b1fbe8d22cb902',
'title': '',
'timestamp': 1614769431,
'duration': 31,
'thumbnail': r're:https?://.+',
'uploader': 'ドルゴルスレンギーン=ダグワドルジ',
'uploader_id': '10660174',
},
}]
def _real_extract(self, url):
user_id, video_id = self._match_valid_url(url).group('user_id', 'id')
webpage = self._download_webpage(f'https://www.mildom.com/clip/{video_id}', video_id)
clip_detail = self._call_api(
'https://cloudac-cf-jp.mildom.com/nonolive/videocontent/clip/detail', video_id,
note='Downloading playback metadata', query={
'clip_id': video_id,
})
return {
'id': video_id,
'title': self._html_search_meta(
('og:description', 'description'), webpage, default=None) or clip_detail.get('title'),
'timestamp': float_or_none(clip_detail.get('create_time')),
'duration': float_or_none(clip_detail.get('length')),
'thumbnail': clip_detail.get('cover'),
'uploader': traverse_obj(clip_detail, ('user_info', 'loginname')),
'uploader_id': user_id,
'url': clip_detail['url'],
'ext': determine_ext(clip_detail.get('url'), 'mp4'),
}
class MildomUserVodIE(MildomBaseIE): class MildomUserVodIE(MildomBaseIE):
IE_NAME = 'mildom:user:vod' IE_NAME = 'mildom:user:vod'
IE_DESC = 'Download all VODs from specific user in Mildom' IE_DESC = 'Download all VODs from specific user in Mildom'
@ -286,29 +258,32 @@ class MildomUserVodIE(MildomBaseIE):
'id': '10093333', 'id': '10093333',
'title': 'Uploads from ねこばたけ', 'title': 'Uploads from ねこばたけ',
}, },
'playlist_mincount': 351, 'playlist_mincount': 732,
}, { }, {
'url': 'https://www.mildom.com/profile/10882672', 'url': 'https://www.mildom.com/profile/10882672',
'info_dict': { 'info_dict': {
'id': '10882672', 'id': '10882672',
'title': 'Uploads from kson組長(けいそん)', 'title': 'Uploads from kson組長(けいそん)',
}, },
'playlist_mincount': 191, 'playlist_mincount': 201,
}] }]
def _entries(self, user_id): def _fetch_page(self, user_id, page):
for page in itertools.count(1): page += 1
reply = self._call_api( reply = self._call_api(
'https://cloudac.mildom.com/nonolive/videocontent/profile/playbackList', 'https://cloudac.mildom.com/nonolive/videocontent/profile/playbackList',
user_id, note='Downloading page %d' % page, query={ user_id, note=f'Downloading page {page}', query={
'user_id': user_id, 'user_id': user_id,
'page': page, 'page': page,
'limit': '30', 'limit': '30',
}) })
if not reply: if not reply:
break return
for x in reply: for x in reply:
yield self.url_result('https://www.mildom.com/playback/%s/%s' % (user_id, x['v_id'])) v_id = x.get('v_id')
if not v_id:
continue
yield self.url_result(f'https://www.mildom.com/playback/{user_id}/{v_id}')
def _real_extract(self, url): def _real_extract(self, url):
user_id = self._match_id(url) user_id = self._match_id(url)
@ -319,4 +294,5 @@ class MildomUserVodIE(MildomBaseIE):
query={'user_id': user_id}, note='Downloading user profile')['user_info'] query={'user_id': user_id}, note='Downloading user profile')['user_info']
return self.playlist_result( return self.playlist_result(
self._entries(user_id), user_id, 'Uploads from %s' % profile['loginname']) OnDemandPagedList(functools.partial(self._fetch_page, user_id), 30),
user_id, f'Uploads from {profile["loginname"]}')

View File

@ -87,6 +87,7 @@ class PeerTubeIE(InfoExtractor):
maindreieck-tv\.de| maindreieck-tv\.de|
mani\.tube| mani\.tube|
manicphase\.me| manicphase\.me|
media\.fsfe\.org|
media\.gzevd\.de| media\.gzevd\.de|
media\.inno3\.cricket| media\.inno3\.cricket|
media\.kaitaia\.life| media\.kaitaia\.life|

View File

@ -33,7 +33,7 @@ class PeriscopeBaseIE(InfoExtractor):
return { return {
'id': broadcast.get('id') or video_id, 'id': broadcast.get('id') or video_id,
'title': self._live_title(title) if is_live else title, 'title': title,
'timestamp': parse_iso8601(broadcast.get('created_at')), 'timestamp': parse_iso8601(broadcast.get('created_at')),
'uploader': uploader, 'uploader': uploader,
'uploader_id': broadcast.get('user_id') or broadcast.get('username'), 'uploader_id': broadcast.get('user_id') or broadcast.get('username'),

View File

@ -59,8 +59,16 @@ class SoundcloudEmbedIE(InfoExtractor):
class SoundcloudBaseIE(InfoExtractor): class SoundcloudBaseIE(InfoExtractor):
_NETRC_MACHINE = 'soundcloud'
_API_V2_BASE = 'https://api-v2.soundcloud.com/' _API_V2_BASE = 'https://api-v2.soundcloud.com/'
_BASE_URL = 'https://soundcloud.com/' _BASE_URL = 'https://soundcloud.com/'
_USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36'
_API_AUTH_QUERY_TEMPLATE = '?client_id=%s'
_API_AUTH_URL_PW = 'https://api-auth.soundcloud.com/web-auth/sign-in/password%s'
_API_VERIFY_AUTH_TOKEN = 'https://api-auth.soundcloud.com/connect/session%s'
_access_token = None
_HEADERS = {}
def _store_client_id(self, client_id): def _store_client_id(self, client_id):
self._downloader.cache.store('soundcloud', 'client_id', client_id) self._downloader.cache.store('soundcloud', 'client_id', client_id)
@ -103,14 +111,6 @@ class SoundcloudBaseIE(InfoExtractor):
self._CLIENT_ID = self._downloader.cache.load('soundcloud', 'client_id') or 'a3e059563d7fd3372b49b37f00a00bcf' self._CLIENT_ID = self._downloader.cache.load('soundcloud', 'client_id') or 'a3e059563d7fd3372b49b37f00a00bcf'
self._login() self._login()
_USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36'
_API_AUTH_QUERY_TEMPLATE = '?client_id=%s'
_API_AUTH_URL_PW = 'https://api-auth.soundcloud.com/web-auth/sign-in/password%s'
_API_VERIFY_AUTH_TOKEN = 'https://api-auth.soundcloud.com/connect/session%s'
_access_token = None
_HEADERS = {}
_NETRC_MACHINE = 'soundcloud'
def _login(self): def _login(self):
username, password = self._get_login_info() username, password = self._get_login_info()
if username is None: if username is None:

View File

@ -67,6 +67,7 @@ class SovietsClosetIE(SovietsClosetBaseIE):
'series': 'The Witcher', 'series': 'The Witcher',
'season': 'Misc', 'season': 'Misc',
'episode_number': 13, 'episode_number': 13,
'episode': 'Episode 13',
}, },
}, },
{ {
@ -92,6 +93,7 @@ class SovietsClosetIE(SovietsClosetBaseIE):
'series': 'Arma 3', 'series': 'Arma 3',
'season': 'Zeus Games', 'season': 'Zeus Games',
'episode_number': 3, 'episode_number': 3,
'episode': 'Episode 3',
}, },
}, },
] ]

View File

@ -0,0 +1,95 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
int_or_none,
try_get,
update_url_query,
url_or_none,
)
class XinpianchangIE(InfoExtractor):
_VALID_URL = r'https?://www\.xinpianchang\.com/(?P<id>[^/]+?)(?:\D|$)'
IE_NAME = 'xinpianchang'
IE_DESC = 'xinpianchang.com'
_TESTS = [{
'url': 'https://www.xinpianchang.com/a11766551',
'info_dict': {
'id': 'a11766551',
'ext': 'mp4',
'title': '北京2022冬奥会闭幕式再见短片-冰墩墩下班了',
'description': 'md5:4a730c10639a82190fabe921c0fa4b87',
'duration': 151,
'thumbnail': r're:^https?://oss-xpc0\.xpccdn\.com.+/assets/',
'uploader': '正时文创',
'uploader_id': 10357277,
'categories': ['宣传片', '国家城市', '广告', '其他'],
'keywords': ['北京冬奥会', '冰墩墩', '再见', '告别', '冰墩墩哭了', '感动', '闭幕式', '熄火']
},
}, {
'url': 'https://www.xinpianchang.com/a11762904',
'info_dict': {
'id': 'a11762904',
'ext': 'mp4',
'title': '冬奥会决胜时刻《法国派出三只鸡?》',
'description': 'md5:55cb139ef8f48f0c877932d1f196df8b',
'duration': 136,
'thumbnail': r're:^https?://oss-xpc0\.xpccdn\.com.+/assets/',
'uploader': '精品动画',
'uploader_id': 10858927,
'categories': ['动画', '三维CG'],
'keywords': ['France Télévisions', '法国3台', '蠢萌', '冬奥会']
},
}, {
'url': 'https://www.xinpianchang.com/a11779743?from=IndexPick&part=%E7%BC%96%E8%BE%91%E7%B2%BE%E9%80%89&index=2',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id=video_id)
domain = self.find_value_with_regex(var='requireNewDomain', webpage=webpage)
vid = self.find_value_with_regex(var='vid', webpage=webpage)
app_key = self.find_value_with_regex(var='modeServerAppKey', webpage=webpage)
api = update_url_query(f'{domain}/mod/api/v2/media/{vid}', {'appKey': app_key})
data = self._download_json(api, video_id=video_id)['data']
formats, subtitles = [], {}
for k, v in data.get('resource').items():
if k in ('dash', 'hls'):
v_url = v.get('url')
if not v_url:
continue
if k == 'dash':
fmts, subs = self._extract_mpd_formats_and_subtitles(v_url, video_id=video_id)
elif k == 'hls':
fmts, subs = self._extract_m3u8_formats_and_subtitles(v_url, video_id=video_id)
formats.extend(fmts)
subtitles = self._merge_subtitles(subtitles, subs)
elif k == 'progressive':
formats.extend([{
'url': url_or_none(prog.get('url')),
'width': int_or_none(prog.get('width')),
'height': int_or_none(prog.get('height')),
'ext': 'mp4',
} for prog in v if prog.get('url') or []])
self._sort_formats(formats)
return {
'id': video_id,
'title': data.get('title'),
'description': data.get('description'),
'duration': int_or_none(data.get('duration')),
'categories': data.get('categories'),
'keywords': data.get('keywords'),
'thumbnail': data.get('cover'),
'uploader': try_get(data, lambda x: x['owner']['username']),
'uploader_id': try_get(data, lambda x: x['owner']['id']),
'formats': formats,
'subtitles': subtitles,
}
def find_value_with_regex(self, var, webpage):
return self._search_regex(rf'var\s{var}\s=\s\"(?P<vid>[^\"]+)\"', webpage, name=var)

View File

@ -3094,6 +3094,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
# Some formats may have much smaller duration than others (possibly damaged during encoding) # Some formats may have much smaller duration than others (possibly damaged during encoding)
# Eg: 2-nOtRESiUc Ref: https://github.com/yt-dlp/yt-dlp/issues/2823 # Eg: 2-nOtRESiUc Ref: https://github.com/yt-dlp/yt-dlp/issues/2823
is_damaged = try_get(fmt, lambda x: float(x['approxDurationMs']) < approx_duration - 10000) is_damaged = try_get(fmt, lambda x: float(x['approxDurationMs']) < approx_duration - 10000)
if is_damaged:
self.report_warning(f'{video_id}: Some formats are possibly damaged. They will be deprioritized', only_once=True)
dct = { dct = {
'asr': int_or_none(fmt.get('audioSampleRate')), 'asr': int_or_none(fmt.get('audioSampleRate')),
'filesize': int_or_none(fmt.get('contentLength')), 'filesize': int_or_none(fmt.get('contentLength')),

View File

@ -149,7 +149,7 @@ class ZingMp3IE(ZingMp3BaseIE):
}, },
}, { }, {
'url': 'https://zingmp3.vn/video-clip/Suong-Hoa-Dua-Loi-K-ICM-RYO/ZO8ZF7C7.html', 'url': 'https://zingmp3.vn/video-clip/Suong-Hoa-Dua-Loi-K-ICM-RYO/ZO8ZF7C7.html',
'md5': 'e9c972b693aa88301ef981c8151c4343', 'md5': 'c7f23d971ac1a4f675456ed13c9b9612',
'info_dict': { 'info_dict': {
'id': 'ZO8ZF7C7', 'id': 'ZO8ZF7C7',
'title': 'Sương Hoa Đưa Lối', 'title': 'Sương Hoa Đưa Lối',
@ -158,6 +158,8 @@ class ZingMp3IE(ZingMp3BaseIE):
'duration': 207, 'duration': 207,
'track': 'Sương Hoa Đưa Lối', 'track': 'Sương Hoa Đưa Lối',
'artist': 'K-ICM, RYO', 'artist': 'K-ICM, RYO',
'album': 'Sương Hoa Đưa Lối (Single)',
'album_artist': 'K-ICM, RYO',
}, },
}, { }, {
'url': 'https://zingmp3.vn/embed/song/ZWZEI76B?start=false', 'url': 'https://zingmp3.vn/embed/song/ZWZEI76B?start=false',

View File

@ -47,6 +47,7 @@ from .compat import (
compat_HTMLParser, compat_HTMLParser,
compat_HTTPError, compat_HTTPError,
compat_basestring, compat_basestring,
compat_brotli,
compat_chr, compat_chr,
compat_cookiejar, compat_cookiejar,
compat_ctypes_WINFUNCTYPE, compat_ctypes_WINFUNCTYPE,
@ -143,10 +144,16 @@ def random_user_agent():
return _USER_AGENT_TPL % random.choice(_CHROME_VERSIONS) return _USER_AGENT_TPL % random.choice(_CHROME_VERSIONS)
SUPPORTED_ENCODINGS = [
'gzip', 'deflate'
]
if compat_brotli:
SUPPORTED_ENCODINGS.append('br')
std_headers = { std_headers = {
'User-Agent': random_user_agent(), 'User-Agent': random_user_agent(),
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Encoding': 'gzip, deflate', 'Accept-Encoding': ', '.join(SUPPORTED_ENCODINGS),
'Accept-Language': 'en-us,en;q=0.5', 'Accept-Language': 'en-us,en;q=0.5',
'Sec-Fetch-Mode': 'navigate', 'Sec-Fetch-Mode': 'navigate',
} }
@ -1023,7 +1030,7 @@ def make_HTTPS_handler(params, **kwargs):
def bug_reports_message(before=';'): def bug_reports_message(before=';'):
msg = ('please report this issue on https://github.com/yt-dlp/yt-dlp , ' msg = ('please report this issue on https://github.com/yt-dlp/yt-dlp , '
'filling out the "Broken site" issue template properly. ' 'filling out the "Broken site" issue template properly. '
'Confirm you are on the latest version using -U') 'Confirm you are on the latest version using yt-dlp -U')
before = before.rstrip() before = before.rstrip()
if not before or before.endswith(('.', '!', '?')): if not before or before.endswith(('.', '!', '?')):
@ -1357,6 +1364,12 @@ class YoutubeDLHandler(compat_urllib_request.HTTPHandler):
except zlib.error: except zlib.error:
return zlib.decompress(data) return zlib.decompress(data)
@staticmethod
def brotli(data):
if not data:
return data
return compat_brotli.decompress(data)
def http_request(self, req): def http_request(self, req):
# According to RFC 3986, URLs can not contain non-ASCII characters, however this is not # According to RFC 3986, URLs can not contain non-ASCII characters, however this is not
# always respected by websites, some tend to give out URLs with non percent-encoded # always respected by websites, some tend to give out URLs with non percent-encoded
@ -1417,6 +1430,12 @@ class YoutubeDLHandler(compat_urllib_request.HTTPHandler):
resp = compat_urllib_request.addinfourl(gz, old_resp.headers, old_resp.url, old_resp.code) resp = compat_urllib_request.addinfourl(gz, old_resp.headers, old_resp.url, old_resp.code)
resp.msg = old_resp.msg resp.msg = old_resp.msg
del resp.headers['Content-encoding'] del resp.headers['Content-encoding']
# brotli
if resp.headers.get('Content-encoding', '') == 'br':
resp = compat_urllib_request.addinfourl(
io.BytesIO(self.brotli(resp.read())), old_resp.headers, old_resp.url, old_resp.code)
resp.msg = old_resp.msg
del resp.headers['Content-encoding']
# Percent-encode redirect URL of Location HTTP header to satisfy RFC 3986 (see # Percent-encode redirect URL of Location HTTP header to satisfy RFC 3986 (see
# https://github.com/ytdl-org/youtube-dl/issues/6457). # https://github.com/ytdl-org/youtube-dl/issues/6457).
if 300 <= resp.code < 400: if 300 <= resp.code < 400:
@ -5462,5 +5481,5 @@ has_websockets = bool(compat_websockets)
def merge_headers(*dicts): def merge_headers(*dicts):
"""Merge dicts of network headers case insensitively, prioritizing the latter ones""" """Merge dicts of http headers case insensitively, prioritizing the latter ones"""
return {k.capitalize(): v for k, v in itertools.chain.from_iterable(map(dict.items, dicts))} return {k.capitalize(): v for k, v in itertools.chain.from_iterable(map(dict.items, dicts))}