Compare commits

..

136 Commits

Author SHA1 Message Date
github-actions
f92347c312 [version] update
Created by: pukkandan

:ci skip all :ci run dl
2022-06-22 01:14:25 +00:00
pukkandan
a86e01e743
Release 2022.06.22.1 2022-06-22 06:43:07 +05:30
pukkandan
1ed70fd0b7
[build] Fix updating homebrew formula
bug in b5899f4f19
2022-06-22 06:43:06 +05:30
github-actions
def4973ae7 [version] update
Created by: pukkandan

:ci skip all :ci run dl
2022-06-22 00:58:00 +00:00
pukkandan
0af80bcf70
Release 2022.06.22 2022-06-22 06:20:42 +05:30
pukkandan
eff4275925
Add deprecation warning for Py3.6
See: https://github.com/yt-dlp/yt-dlp/issues/3764
2022-06-22 06:20:40 +05:30
pukkandan
998a3cae0c
[cleanup] Misc fixes 2022-06-22 03:47:41 +05:30
pukkandan
471d0367c7
[youtube:clips] Support downloading clips
Closes #2543
2022-06-22 02:50:55 +05:30
pukkandan
3975b4d2e8
Allow extractors to specify section_start/end for clips 2022-06-22 02:44:28 +05:30
pukkandan
230d5c8239
[jsinterp] Some optimizations and refactoring
Motivated by: https://github.com/ytdl-org/youtube-dl/issues/30641#issuecomment-1041904912

Authored by: dirkf, pukkandan
2022-06-21 23:23:48 +05:30
pukkandan
e4afcfde08
[build] Add Linux standalone builds 2022-06-21 17:02:57 +05:30
pukkandan
8372be7469
[update] Self-restart after update 2022-06-21 17:02:57 +05:30
pukkandan
57e0f077a6
[update] Expose more functionality to API 2022-06-21 17:02:56 +05:30
pukkandan
f0500bd1e4
[test] Fix FakeYDL signatures
Authored by: coletdjnz
2022-06-21 13:03:29 +05:30
pukkandan
95032f302c
[f4m] Bugfix 2022-06-21 13:03:29 +05:30
pukkandan
8102a5991b
[extractor/mediaset] Improve _VALID_URL 2022-06-21 13:03:28 +05:30
HobbyistDev
c27eaf8920
[extractor/kicker.de] Add extractor (#4073)
Closes #3670
Authored by: HobbyistDev
2022-06-21 00:30:43 -07:00
pukkandan
dfb855b42d
[extractor/BiliIntl] Fix subtitle extraction
Closes #3123

Authored by: HobbyistDev
2022-06-20 14:08:32 +05:30
pukkandan
5df1444255
[utils] ExtractorError: Fix exc_info 2022-06-20 12:35:02 +05:30
pukkandan
612f2be5d3
Bugfix for 7b2c3f47c6 2022-06-20 12:03:35 +05:30
pukkandan
6d1b34896e
Update to ytdl-commit-8a158a9
[NHK] Use new API URL
6508688e88

Closes #2337, Closes #4063
2022-06-20 11:44:57 +05:30
pukkandan
7b2c3f47c6
[cleanup] Misc 2022-06-20 11:44:55 +05:30
pukkandan
8aa0e7cd96
[docs] Improvements 2022-06-20 10:48:29 +05:30
HobbyistDev
695b28afaa
[DailyWire] Add extractors (#4084)
Closes #3139
Authored by: HobbyistDev, pukkandan
2022-06-19 20:50:45 -07:00
ischmidt20
0a4fb0d3fe
[WatchESPN] Support free videos and BAM_DTC (#4118)
Authored by: ischmidt20
2022-06-19 20:06:37 -07:00
pukkandan
8072ef2bbd
[extractor/BiliIntl] Fix metadata extraction
Closes #4116
2022-06-20 03:05:46 +05:30
Elyse
40268a7974
[extractor/foxnews] Update embed extraction (#4043)
Authored by: elyse0
2022-06-19 18:59:48 +05:30
HobbyistDev
697ebe4d31
[extractor/ixigua] Add Extractor (#3953)
Closes #2840
Authored by: HobbyistDev
2022-06-18 20:48:50 -07:00
bubbleguuum
38d86f4d45
[extractor/radiofrance] Add more radios (#4065)
Closes #4087 
Authored by: bubbleguuum
2022-06-18 18:36:14 -07:00
pukkandan
f254d6ccd9
[extractor/dropbox] Extract the correct mountComponent 2022-06-19 06:46:46 +05:30
coletdev
f0bc6e2019
[extractor] Add default parameter to _search_json (#4057)
Authored by: pukkandan, coletdjnz
2022-06-18 17:55:18 -07:00
MMM
9fde8a6b12
[extractor/lbry] Update livestream API (#4042)
Authored by: flashdagger
2022-06-18 17:10:22 -07:00
Elyse
612e31f5ea
[extractor/substack] Add extractor (#4011)
Closes #3722
Authored by: elyse0
2022-06-18 17:08:53 -07:00
Abubukker Chaudhary
7a2e40dd48
[extractor/MirrorCoUK] Add extractor (#3999)
Authored by: LunarFang416, pukkandan
2022-06-18 16:59:57 -07:00
HobbyistDev
60ba603ab5
[extractor/netverse] Add extractors (#3854)
Authored by: HobbyistDev, pukkandan
2022-06-19 05:08:45 +05:30
Zhymabek Roman
a79cba0c95
[exctractor/digitalconcerthall] Fix extractor (#4105)
Authored by: ZhymabekRoman
2022-06-18 23:28:25 +05:30
Lesmiscore
4f2a58c9c5
[extractor/pornhub] Extract uploader_id field (#4104)
Authored by: Lesmiscore
2022-06-19 00:06:12 +09:00
pukkandan
44a6fcff39
Improve error handling of bad config files
Related: #824
2022-06-18 09:19:39 +05:30
pukkandan
bf1824b391
[cleanup] Deprecate YoutubeDL.parse_outtmpl 2022-06-18 08:36:39 +05:30
pukkandan
a70635b8a1
[cleanup, utils] Don't use kwargs for format_field 2022-06-18 08:13:22 +05:30
christoph-heinrich
e121e3cee7
[cleanup] Minor fixes (#4096)
Authored by: christoph-heinrich
2022-06-17 18:57:22 -07:00
pukkandan
7e9a612585
Add option --lazy-playlist to process entries as they are received 2022-06-17 14:20:40 +05:30
pukkandan
0df111a371
[youtube] Extract comment_count from webpage
Closes #4091
2022-06-17 12:00:55 +05:30
pukkandan
a39a7ba8d6
[extractor/tiktok] Extract SIGI_STATE
Based on #3624, https://github.com/ytdl-org/youtube-dl/pull/30479

Closes #3551

Authored by dirkf, sulyi, pukkandan
2022-06-17 11:24:09 +05:30
pukkandan
7e88d7d78f
Add slicing notation to --playlist-items
* Adds support for negative indices and step
* Add `-I` as alias for `--playlist-index`
* Deprecates `--playlist-start`, `--playlist-end`, `--playlist-reverse`, `--no-playlist-reverse`

Closes #2951, Closes #2853
2022-06-17 10:36:52 +05:30
pukkandan
f0c9fb9682
[utils] Popen: Refactor to use contextmanager
Fixes https://github.com/yt-dlp/yt-dlp/issues/3531#issuecomment-1156223597
2022-06-16 06:23:50 +05:30
pukkandan
560738f34d
[extractor] Import _ALL_CLASSES lazily
This significantly speeds up `import yt_dlp` in the absence of `lazy_extractors`
2022-06-16 06:23:50 +05:30
pukkandan
99d10bf607
[cleanup, extractor] Rename extractors.py to _extractors.py
This should be considered part of the next commit,
but is separated so that `git` can detect the renaming better
2022-06-16 06:23:49 +05:30
Evan Spensley
145c5a83a8
[extractor/GoogleDrive] Add folder extractor (#4009)
Closes #3388
Authored by: evansp, pukkandan
2022-06-14 06:33:29 -07:00
pukkandan
2cb1982043
[utils] locked_file: Fix for PyPy on Windows 2022-06-13 19:21:31 +05:30
pukkandan
fccf90e7f3
Fix bug in 56ba69e4c9 2022-06-13 19:16:06 +05:30
pukkandan
d32f30ac48
Add --no-update
Closes #4060
2022-06-13 19:15:54 +05:30
pukkandan
e3aae45a6f
[extractor/zdf] Fix bug in 62b2b736e7
Closes #4061
2022-06-13 19:13:59 +05:30
pukkandan
f3c0c77304
[extractor] Handle json_ld with multiple @types
Closes: #4022
2022-06-13 19:12:34 +05:30
pukkandan
79e591b59b
[extractor/rumble] Detect JS embed
Closes #4064
2022-06-13 19:08:01 +05:30
pukkandan
21a73e9f39
[extractor/generic] Revert e6ae51c123
85553414ae made it unnecessary
2022-06-13 18:40:33 +05:30
coletdjnz
4ce05f5759
[extractor/youtube] Fix live chat for videos with content warning
Fixes #4051
Authored by: coletdjnz
2022-06-12 17:56:50 +12:00
Lesmiscore
2523702718
[extractor/tver] Fix bug in 6837633a4a
This corrects a mistake in 64fa820ccf
Authored by: Lesmiscore
Closes #4054
2022-06-12 12:06:00 +09:00
pukkandan
55baa67c7c
[extractor/jwplatform] Look for data-video-jw-id
Closes #3821
2022-06-12 03:26:00 +05:30
pukkandan
64fa820ccf
[cleanup] Misc fixes (see desc)
* [tvver] Fix bug in 6837633a4a - Closes #4054
* [rumble] Fix tests - Closes #3976
* [make] Remove `cat` abuse - Closes #3989
* [make] Revert #3684 - Closes #3814
* [utils] Improve `get_elements_by_class` - Closes #3993
* [utils] Inherit `Namespace` from `types.SimpleNamespace`
* [utils] Use `re.fullmatch` for matching filters
* [jsinterp] Handle quotes in `_separate`
* [make_readme] Allow overshooting last line

Authored by: pukkandan, kwconder, MrRawes, Lesmiscore
2022-06-12 00:08:16 +05:30
pukkandan
56ba69e4c9
[cleanup] Misc fixes
Closes #4027
2022-06-11 05:00:12 +05:30
Aurélien Grosdidier
d05460e5fe
[extractor/FranceCulture] Fix extractor (#3874)
Closes #3742
Authored by: aurelg, pukkandan
2022-06-10 16:22:34 -07:00
ping
14c3a98049
[extractor/naver] Add navernow extractor (#3866)
Authored by: ping
2022-06-10 15:38:32 -07:00
Elyse
e0a4a3d5bf
[extractor/freetv] Add extractor (#3587)
Closes #3486
Authored by: elyse0
2022-06-10 15:34:09 -07:00
Elyse
62b2b736e7
[extractor/zdf] Improve format sorting (#4040)
Closes #4020

Authored by: elyse0
2022-06-10 15:22:14 -07:00
Lesmiscore
6837633a4a
[extractor/tver] Fix extractor (#4033)
Authored by: Lesmiscore
2022-06-09 23:55:58 +09:00
coletdev
2ae778b8fc
[extractor/youtube] Add innertube_host and innertube_key extractor args (#3916)
Allows user to override Innertube API host or key for all requests
Authored by: coletdjnz
2022-06-08 22:18:01 +00:00
Ashish Gupta
c82a4a8fce
[extractor/atscaleconfevent] Add extractor (#3971)
Closes #3961
Authored by: Ashish0804
2022-06-07 15:36:46 -07:00
vkorablin
6e7c9201cd
[extractor/ccc] Extract view_count (#3939)
Authored by: vkorablin
2022-06-07 15:20:42 -07:00
Angel Toloza
bde0132e15
[extractor/southpark] Add southpark.lat extractor (#4008)
Authored by: darkxex
2022-06-07 15:12:56 -07:00
pukkandan
233ad894d3
[update] Use .git folder to distinguish source/unknown
This is not perfect, but is good enough for how we use this information

Closes #3994
2022-06-08 00:17:42 +05:30
Daniel Lindholm
0d6bafbfa7
[expressen] Fix extractor (#4006)
Authored by: aejdl
2022-06-07 06:00:27 -07:00
MMM
36195c4461
[dash] Show fragment count with --live-from-start (#3493)
Authored by: flashdagger
2022-06-07 05:44:08 -07:00
coletdjnz
65141660ab
[extractor/youtube] Fix bug in b7c47b7438
Closes #3997

Authored by: coletdjnz
2022-06-07 12:26:36 +12:00
Christoph Moench-Tegeder
dec30912a7
[cookies] Detect profiles for cygwin/BSD (#3975)
Closes #3370
Authored by: moench-tegeder
2022-06-06 14:17:49 -07:00
pukkandan
5ec1b6b716
Add option --download-sections to download video partially
Closes #52, Closes #3932
2022-06-07 02:41:55 +05:30
pukkandan
e0ab98541c
[ExtractAudio] Allow conditional conversion
Closes #1715
2022-06-06 21:51:28 +05:30
pukkandan
35faefee5d
[ExtractAudio, cleanup] Refactor 2022-06-06 21:49:57 +05:30
pukkandan
b7c47b7438
[extractor] Add _search_json
All fetching of JSON objects should eventually be done with this function
but only `youtube` is being refactored for now
2022-06-06 19:46:45 +05:30
pukkandan
00bbc5f177
[ThumbnailsConvertor] Allow conditional conversion
Closes #3970
2022-06-05 20:51:19 +05:30
Lesmiscore
0bea4fd807
[extractor/0000studio] Add extractors (#3959)
Authored by: Lesmiscore
2022-06-05 14:37:05 +09:00
ischmidt20
b5770743fe
[extractor/espn] Add WatchESPN extractor (#2283)
Authored by: ischmidt20, pukkandan
2022-06-03 20:02:15 -07:00
pukkandan
1890fc6389
[cleanup] Misc fixes
Cherry-picks from: #3498, #3947
Related: #3949, https://github.com/yt-dlp/yt-dlp/issues/1839#issuecomment-1140313836
Authored by: pukkandan, flashdagger, gamer191
2022-06-03 21:45:35 +05:30
pukkandan
c4910024f3
[extractor] Fix bug in 617f658b7e
While the function signature don't enforce it, some IEs that override
`_download_webpage_handle` assume all optional arguments to be keyword-only

Closes #3954
2022-06-03 17:25:20 +05:30
coletdev
c7a7baaa13
[extractor/youtube] Fix :ytnotifications extractor (#3775)
Still some issues, see https://github.com/yt-dlp/yt-dlp/pull/3775

Authored by: coletdjnz
2022-06-03 07:04:39 +00:00
siddharth ravikumar
e50c3500b4
[extractor/npr] Use stream url from json-ld (#3455)
Closes #1934
Authored by: r5d
2022-06-02 17:51:11 -07:00
pukkandan
09d02ea429
[extractor] Fix bug in f95b9dee45
Closes #3951
2022-06-03 06:16:01 +05:30
sqrtNOT
ac05fb9338
[extractor/niconico:series] Fix extractor (#3935)
Authored by: sqrtNOT
2022-06-02 09:02:42 -07:00
pukkandan
28786529dc
[extractor/dropout] Login is not mandatory
Workaround for #3931
2022-06-01 02:03:25 +05:30
pukkandan
6b0b0a289a
[extractor/youtube:tab] Detect videoRenderer in _post_thread_continuation_entries 2022-06-01 02:03:24 +05:30
pukkandan
f95b9dee45
[extractor] Add dev option --load-pages 2022-06-01 02:03:22 +05:30
pukkandan
617f658b7e
[extractor, cleanup] Refactor _download_... methods 2022-06-01 01:57:16 +05:30
pukkandan
8a7f6d7a15
Do not print progress to stderr with -q
It is arguable how this "should" behave, but since progress is always
written to stdout in older yt-dl/p, we should keep it as-is

Bug in cf4f42cb97
Closes #3844
2022-06-01 01:57:14 +05:30
Lesmiscore
9c0412cf6b
[extractor/vevo] Fix extractor (#3921)
Authored by: Lesmiscore
2022-06-01 01:10:53 +09:00
gamer191
84131d0351
[extractor/animelab] Remove extractor (#3922)
https://www.animelab.com/sunset

Authored by: gamer191
2022-05-31 08:51:22 -07:00
Lesmiscore
1cd6cba306
[extractor/PokemonSoundLibrary] Remove extractor (#3918)
Authored by: Lesmiscore
2022-05-31 18:02:29 +09:00
Lesmiscore
661e7253a2
[extractor/iwara:user] Make paging better (#3901)
Authored by: Lesmiscore
2022-05-31 10:52:42 +09:00
Lesmiscore
222a230871
[extractor/common] Recognize src attribute from HTML5 media elements (#3899)
Authored by: Lesmiscore
2022-05-29 22:48:04 +09:00
coletdjnz
ee27297f82
[extractor/youtube] Fix initial player response extraction
Authored by: pukkandan, coletdjnz
2022-05-29 19:54:22 +12:00
Stefan Borer
ee164987c7
[extractor/playsuisse] Add extractor (#845)
Authored by: sbor23, pukkandan
2022-05-28 16:44:17 -07:00
pukkandan
0fe51254cb
[extractor/youtube] Bring back _extract_chapters_from_description
Closes #3886
2022-05-29 01:00:41 +05:30
pukkandan
52023f1291
[extractor/youtube] Make signature extraction non-fatal
and reduce verbosity of it's warning

Closes #3882
2022-05-29 00:00:24 +05:30
mozbugbox
5bbe631e04
[extractor/duboku] Fix for hostname change (#3891)
Authored by: mozbugbox
2022-05-28 06:35:10 -07:00
coletdev
2c6dcb65fb
[utils] Send HTTP/1.1 ALPN extension (#3889)
Some servers may reject requests if not sent (e.g. fingerprinting)

Fixes #3878

Authored by: coletdjnz
2022-05-28 03:46:36 +00:00
miseran
520876fa09
[extractor/zattoo] Fix live streams (#3812)
Authored by: miseran
2022-05-27 09:29:19 -07:00
pukkandan
0bf9dc1e35
Fix bug in 8a82af3511 2022-05-27 21:29:30 +05:30
pukkandan
829bbd1d05
[youtube] Add warning for PostLiveDvr
Closes #3746, Related #1564
2022-05-27 05:07:00 +05:30
pukkandan
8a82af3511
[cleanup] Misc fixes and cleanup
Closes #3780, Closes #3853, Closes #3850
2022-05-27 04:43:43 +05:30
pukkandan
8246f8402b
[spotify:show] Fix extractor
Closes #3768
2022-05-27 04:33:03 +05:30
pukkandan
6b9e832db7
--config-location - to provide options interactively 2022-05-27 04:32:54 +05:30
monnef
d2ff2c91bb
[curiositystream] Get auth_token from cookie (#3836)
Closes #3753
Authored by: mnn
2022-05-26 16:02:20 -07:00
m4tu4g
7879e79d11
[bloomberg] Change playback endpoint (#3857)
Closes #3787
Authored by: m4tu4g
2022-05-24 02:05:23 -07:00
Lesmiscore
8a3e7b1c95
[yahoo:gyao] Fix extractor
This fixes 400 error for /title/ URLs.
2022-05-24 03:01:52 +09:00
pukkandan
d9473db78a
[ModifyChapters] Fix repeated removal of small segments
Closes #3846
2022-05-23 16:12:33 +05:30
pukkandan
11233f2afd
[downloader, cleanup] Refactor report_progress
Closes #3790
2022-05-22 21:54:06 +05:30
pukkandan
3a85e9cee9
[ffmpeg] Check version lazily
Closes #3830
2022-05-22 19:56:22 +05:30
pukkandan
c4a62b99f6
Fix bug in 23326151c4 2022-05-22 17:27:04 +05:30
pukkandan
b5899f4f19
[build, cleanup] Refactor
Closes #3835, #3837
2022-05-22 17:07:18 +05:30
Felix S
92922fe7f9
[rumble] Extract subtitles (#3823)
Closes #3132
Authored by: fstirlitz
2022-05-21 05:00:32 -07:00
pukkandan
c487cf0010
[cleanup] Misc 2022-05-21 16:01:53 +05:30
pukkandan
415f8d51a8
Ensure pre-processor errors do not block video download
Closes #2875
2022-05-21 02:30:16 +05:30
pukkandan
ca6d59d2c1
Fix --simulate --max-downloads
Bug in c3e6ffba53
Closes #3815
2022-05-20 23:13:31 +05:30
pukkandan
1a8cc83735
Bugfix for 3a408f9d19 2022-05-20 21:25:07 +05:30
pukkandan
2762dbb17e
[compat] Add functools.cached_property 2022-05-20 21:06:37 +05:30
pukkandan
666c36d58d
Bugfix for 23326151c4 2022-05-20 21:03:19 +05:30
adamanldo
854b0d325e
[StreamCZ] Fix extractor (#3789)
Closes #3579
Authored by: dirkf, adamanldo
2022-05-20 06:19:13 -07:00
Elyse
79c318937b
[ina] Fix extractor (#3807)
Closes #2463
Authored by: elyse0
2022-05-20 03:17:32 -07:00
Jeff Huffman
88d62206b4
[crunchyroll:beta] Fix extractor after API change (#3801)
Closes #2052
Authored by: Burve, tejing1
2022-05-19 17:37:04 -07:00
pukkandan
e79969b242
Return an error code if update fails
Closes #3802
2022-05-20 06:01:37 +05:30
pukkandan
53973b4d2c
[utils] Fix bug in 0b9c08b47b
* Cache of `supports_terminal_sequences` must be reset after enabling VT mode
* and move `windows_enable_vt_mode` to utils to avoid cyclic imports
2022-05-20 06:01:09 +05:30
pukkandan
b801cd7179
[tiktok] Detect embeds
Closes #3799
2022-05-20 06:01:08 +05:30
pukkandan
0b9c08b47b
[utils] Improve performance using functools.cache
Closes #3786
2022-05-19 20:23:53 +05:30
pukkandan
2f97cc615b
[utils] ISO3166Utils: Add EU and AP
Fixes https://github.com/yt-dlp/yt-dlp/pull/3302#discussion_r875528517
2022-05-19 20:05:26 +05:30
pukkandan
2dd5a2e3a1
[doc, cleanup] Re-indent "Usage and Options" section 2022-05-19 20:05:17 +05:30
pukkandan
23326151c4
Add option --retry-sleep (#3059)
Closes #2852
2022-05-19 20:00:31 +05:30
pukkandan
9e49146352
Add option --alias 2022-05-19 19:45:21 +05:30
148 changed files with 8368 additions and 5890 deletions

View File

@ -11,7 +11,7 @@ body:
options:
- label: I'm reporting a broken site
required: true
- label: I've verified that I'm running yt-dlp version **2022.05.18** ([update instructions](https://github.com/yt-dlp/yt-dlp#update)) or later (specify commit)
- label: I've verified that I'm running yt-dlp version **2022.06.22.1** ([update instructions](https://github.com/yt-dlp/yt-dlp#update)) or later (specify commit)
required: true
- label: I've checked that all provided URLs are playable in a browser with the same IP and same login details
required: true
@ -51,12 +51,12 @@ body:
[debug] Portable config file: yt-dlp.conf
[debug] Portable config: ['-i']
[debug] Encodings: locale cp1252, fs utf-8, stdout utf-8, stderr utf-8, pref cp1252
[debug] yt-dlp version 2022.05.18 (exe)
[debug] yt-dlp version 2022.06.22.1 (exe)
[debug] Python version 3.8.8 (CPython 64bit) - Windows-10-10.0.19041-SP0
[debug] exe versions: ffmpeg 3.0.1, ffprobe 3.0.1
[debug] Optional libraries: Cryptodome, keyring, mutagen, sqlite, websockets
[debug] Proxy map: {}
yt-dlp is up to date (2022.05.18)
yt-dlp is up to date (2022.06.22.1)
<more lines>
render: shell
validations:

View File

@ -11,7 +11,7 @@ body:
options:
- label: I'm reporting a new site support request
required: true
- label: I've verified that I'm running yt-dlp version **2022.05.18** ([update instructions](https://github.com/yt-dlp/yt-dlp#update)) or later (specify commit)
- label: I've verified that I'm running yt-dlp version **2022.06.22.1** ([update instructions](https://github.com/yt-dlp/yt-dlp#update)) or later (specify commit)
required: true
- label: I've checked that all provided URLs are playable in a browser with the same IP and same login details
required: true
@ -62,12 +62,12 @@ body:
[debug] Portable config file: yt-dlp.conf
[debug] Portable config: ['-i']
[debug] Encodings: locale cp1252, fs utf-8, stdout utf-8, stderr utf-8, pref cp1252
[debug] yt-dlp version 2022.05.18 (exe)
[debug] yt-dlp version 2022.06.22.1 (exe)
[debug] Python version 3.8.8 (CPython 64bit) - Windows-10-10.0.19041-SP0
[debug] exe versions: ffmpeg 3.0.1, ffprobe 3.0.1
[debug] Optional libraries: Cryptodome, keyring, mutagen, sqlite, websockets
[debug] Proxy map: {}
yt-dlp is up to date (2022.05.18)
yt-dlp is up to date (2022.06.22.1)
<more lines>
render: shell
validations:

View File

@ -9,9 +9,9 @@ body:
description: |
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of yt-dlp:
options:
- label: I'm reporting a site feature request
- label: I'm requesting a site-specific feature
required: true
- label: I've verified that I'm running yt-dlp version **2022.05.18** ([update instructions](https://github.com/yt-dlp/yt-dlp#update)) or later (specify commit)
- label: I've verified that I'm running yt-dlp version **2022.06.22.1** ([update instructions](https://github.com/yt-dlp/yt-dlp#update)) or later (specify commit)
required: true
- label: I've checked that all provided URLs are playable in a browser with the same IP and same login details
required: true
@ -60,12 +60,12 @@ body:
[debug] Portable config file: yt-dlp.conf
[debug] Portable config: ['-i']
[debug] Encodings: locale cp1252, fs utf-8, stdout utf-8, stderr utf-8, pref cp1252
[debug] yt-dlp version 2022.05.18 (exe)
[debug] yt-dlp version 2022.06.22.1 (exe)
[debug] Python version 3.8.8 (CPython 64bit) - Windows-10-10.0.19041-SP0
[debug] exe versions: ffmpeg 3.0.1, ffprobe 3.0.1
[debug] Optional libraries: Cryptodome, keyring, mutagen, sqlite, websockets
[debug] Proxy map: {}
yt-dlp is up to date (2022.05.18)
yt-dlp is up to date (2022.06.22.1)
<more lines>
render: shell
validations:

View File

@ -11,7 +11,7 @@ body:
options:
- label: I'm reporting a bug unrelated to a specific site
required: true
- label: I've verified that I'm running yt-dlp version **2022.05.18** ([update instructions](https://github.com/yt-dlp/yt-dlp#update)) or later (specify commit)
- label: I've verified that I'm running yt-dlp version **2022.06.22.1** ([update instructions](https://github.com/yt-dlp/yt-dlp#update)) or later (specify commit)
required: true
- label: I've checked that all provided URLs are playable in a browser with the same IP and same login details
required: true
@ -45,12 +45,12 @@ body:
[debug] Portable config file: yt-dlp.conf
[debug] Portable config: ['-i']
[debug] Encodings: locale cp1252, fs utf-8, stdout utf-8, stderr utf-8, pref cp1252
[debug] yt-dlp version 2022.05.18 (exe)
[debug] yt-dlp version 2022.06.22.1 (exe)
[debug] Python version 3.8.8 (CPython 64bit) - Windows-10-10.0.19041-SP0
[debug] exe versions: ffmpeg 3.0.1, ffprobe 3.0.1
[debug] Optional libraries: Cryptodome, keyring, mutagen, sqlite, websockets
[debug] Proxy map: {}
yt-dlp is up to date (2022.05.18)
yt-dlp is up to date (2022.06.22.1)
<more lines>
render: shell
validations:

View File

@ -9,11 +9,11 @@ body:
description: |
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of yt-dlp:
options:
- label: I'm reporting a feature request
- label: I'm requesting a feature unrelated to a specific site
required: true
- label: I've looked through the [README](https://github.com/yt-dlp/yt-dlp#readme)
required: true
- label: I've verified that I'm running yt-dlp version **2022.05.18** ([update instructions](https://github.com/yt-dlp/yt-dlp#update)) or later (specify commit)
- label: I've verified that I'm running yt-dlp version **2022.06.22.1** ([update instructions](https://github.com/yt-dlp/yt-dlp#update)) or later (specify commit)
required: true
- label: I've searched the [bugtracker](https://github.com/yt-dlp/yt-dlp/issues?q=) for similar issues including closed ones. DO NOT post duplicates
required: true

View File

@ -9,13 +9,15 @@ body:
description: |
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of yt-dlp:
options:
- label: I'm asking a question and **not** reporting a bug/feature request
- label: I'm asking a question and **not** reporting a bug or requesting a feature
required: true
- label: I've looked through the [README](https://github.com/yt-dlp/yt-dlp#readme)
required: true
- label: I've read the [guidelines for opening an issue](https://github.com/yt-dlp/yt-dlp/blob/master/CONTRIBUTING.md#opening-an-issue)
- label: I've verified that I'm running yt-dlp version **2022.06.22.1** ([update instructions](https://github.com/yt-dlp/yt-dlp#update)) or later (specify commit)
required: true
- label: I've searched the [bugtracker](https://github.com/yt-dlp/yt-dlp/issues?q=) for similar questions including closed ones
- label: I've searched the [bugtracker](https://github.com/yt-dlp/yt-dlp/issues?q=) for similar questions including closed ones. DO NOT post duplicates
required: true
- label: I've read the [guidelines for opening an issue](https://github.com/yt-dlp/yt-dlp/blob/master/CONTRIBUTING.md#opening-an-issue)
required: true
- type: textarea
id: question

View File

@ -9,7 +9,7 @@ body:
description: |
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of yt-dlp:
options:
- label: I'm reporting a site feature request
- label: I'm requesting a site-specific feature
required: true
- label: I've verified that I'm running yt-dlp version **%(version)s** ([update instructions](https://github.com/yt-dlp/yt-dlp#update)) or later (specify commit)
required: true

View File

@ -9,7 +9,7 @@ body:
description: |
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of yt-dlp:
options:
- label: I'm reporting a feature request
- label: I'm requesting a feature unrelated to a specific site
required: true
- label: I've looked through the [README](https://github.com/yt-dlp/yt-dlp#readme)
required: true

View File

@ -9,13 +9,15 @@ body:
description: |
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of yt-dlp:
options:
- label: I'm asking a question and **not** reporting a bug/feature request
- label: I'm asking a question and **not** reporting a bug or requesting a feature
required: true
- label: I've looked through the [README](https://github.com/yt-dlp/yt-dlp#readme)
required: true
- label: I've read the [guidelines for opening an issue](https://github.com/yt-dlp/yt-dlp/blob/master/CONTRIBUTING.md#opening-an-issue)
- label: I've verified that I'm running yt-dlp version **%(version)s** ([update instructions](https://github.com/yt-dlp/yt-dlp#update)) or later (specify commit)
required: true
- label: I've searched the [bugtracker](https://github.com/yt-dlp/yt-dlp/issues?q=) for similar questions including closed ones
- label: I've searched the [bugtracker](https://github.com/yt-dlp/yt-dlp/issues?q=) for similar questions including closed ones. DO NOT post duplicates
required: true
- label: I've read the [guidelines for opening an issue](https://github.com/yt-dlp/yt-dlp/blob/master/CONTRIBUTING.md#opening-an-issue)
required: true
- type: textarea
id: question

View File

@ -2,27 +2,20 @@ name: Build
on: workflow_dispatch
jobs:
build_unix:
create_release:
runs-on: ubuntu-latest
outputs:
version_suffix: ${{ steps.version_suffix.outputs.version_suffix }}
ytdlp_version: ${{ steps.bump_version.outputs.ytdlp_version }}
upload_url: ${{ steps.create_release.outputs.upload_url }}
sha256_bin: ${{ steps.sha256_bin.outputs.sha256_bin }}
sha512_bin: ${{ steps.sha512_bin.outputs.sha512_bin }}
sha256_tar: ${{ steps.sha256_tar.outputs.sha256_tar }}
sha512_tar: ${{ steps.sha512_tar.outputs.sha512_tar }}
steps:
- uses: actions/checkout@v2
with:
fetch-depth: 0
- name: Set up Python
uses: actions/setup-python@v2
- uses: actions/setup-python@v2
with:
python-version: '3.8'
- name: Install packages
run: sudo apt-get -y install zip pandoc man
python-version: '3.10'
- name: Set version suffix
id: version_suffix
env:
@ -34,83 +27,27 @@ jobs:
run: |
python devscripts/update-version.py ${{ steps.version_suffix.outputs.version_suffix }}
make issuetemplates
- name: Push to release
id: push_release
run: |
git config --global user.name github-actions
git config --global user.email github-actions@example.com
git add -u
git commit -m "[version] update" -m "Created by: ${{ github.event.sender.login }}" -m ":ci skip all"
git commit -m "[version] update" -m "Created by: ${{ github.event.sender.login }}" -m ":ci skip all :ci run dl"
git push origin --force ${{ github.event.ref }}:release
echo ::set-output name=head_sha::$(git rev-parse HEAD)
- name: Update master
id: push_master
env:
PUSH_VERSION_COMMIT: ${{ secrets.PUSH_VERSION_COMMIT }}
if: "env.PUSH_VERSION_COMMIT != ''"
run: git push origin ${{ github.event.ref }}
- name: Get Changelog
id: get_changelog
run: |
changelog=$(cat Changelog.md | grep -oPz '(?s)(?<=### ${{ steps.bump_version.outputs.ytdlp_version }}\n{2}).+?(?=\n{2,3}###)') || true
changelog=$(grep -oPz '(?s)(?<=### ${{ steps.bump_version.outputs.ytdlp_version }}\n{2}).+?(?=\n{2,3}###)' Changelog.md) || true
echo "changelog<<EOF" >> $GITHUB_ENV
echo "$changelog" >> $GITHUB_ENV
echo "EOF" >> $GITHUB_ENV
- name: Build lazy extractors
id: lazy_extractors
run: python devscripts/make_lazy_extractors.py
- name: Run Make
run: make all tar
- name: Get SHA2-256SUMS for yt-dlp
id: sha256_bin
run: echo "::set-output name=sha256_bin::$(sha256sum yt-dlp | awk '{print $1}')"
- name: Get SHA2-256SUMS for yt-dlp.tar.gz
id: sha256_tar
run: echo "::set-output name=sha256_tar::$(sha256sum yt-dlp.tar.gz | awk '{print $1}')"
- name: Get SHA2-512SUMS for yt-dlp
id: sha512_bin
run: echo "::set-output name=sha512_bin::$(sha512sum yt-dlp | awk '{print $1}')"
- name: Get SHA2-512SUMS for yt-dlp.tar.gz
id: sha512_tar
run: echo "::set-output name=sha512_tar::$(sha512sum yt-dlp.tar.gz | awk '{print $1}')"
- name: Install dependencies for pypi
env:
PYPI_TOKEN: ${{ secrets.PYPI_TOKEN }}
if: "env.PYPI_TOKEN != ''"
run: |
python -m pip install --upgrade pip
pip install setuptools wheel twine
- name: Build and publish on pypi
env:
TWINE_USERNAME: __token__
TWINE_PASSWORD: ${{ secrets.PYPI_TOKEN }}
if: "env.TWINE_PASSWORD != ''"
run: |
rm -rf dist/*
python setup.py sdist bdist_wheel
twine upload dist/*
- name: Install SSH private key
env:
BREW_TOKEN: ${{ secrets.BREW_TOKEN }}
if: "env.BREW_TOKEN != ''"
uses: yt-dlp/ssh-agent@v0.5.3
with:
ssh-private-key: ${{ env.BREW_TOKEN }}
- name: Update Homebrew Formulae
env:
BREW_TOKEN: ${{ secrets.BREW_TOKEN }}
if: "env.BREW_TOKEN != ''"
run: |
git clone git@github.com:yt-dlp/homebrew-taps taps/
python3 devscripts/update-formulae.py taps/Formula/yt-dlp.rb "${{ steps.bump_version.outputs.ytdlp_version }}"
git -C taps/ config user.name github-actions
git -C taps/ config user.email github-actions@example.com
git -C taps/ commit -am 'yt-dlp: ${{ steps.bump_version.outputs.ytdlp_version }}'
git -C taps/ push
- name: Create Release
id: create_release
uses: actions/create-release@v1
@ -129,13 +66,60 @@ jobs:
${{ env.changelog }}
draft: false
prerelease: false
- name: Upload yt-dlp Unix binary
id: upload-release-asset
build_unix:
needs: create_release
runs-on: ubuntu-18.04 # Standalone executable should be built on minimum supported OS
outputs:
sha256_bin: ${{ steps.get_sha.outputs.sha256_bin }}
sha512_bin: ${{ steps.get_sha.outputs.sha512_bin }}
sha256_tar: ${{ steps.get_sha.outputs.sha256_tar }}
sha512_tar: ${{ steps.get_sha.outputs.sha512_tar }}
sha256_linux: ${{ steps.get_sha.outputs.sha256_linux }}
sha512_linux: ${{ steps.get_sha.outputs.sha512_linux }}
sha256_linux_zip: ${{ steps.get_sha.outputs.sha256_linux_zip }}
sha512_linux_zip: ${{ steps.get_sha.outputs.sha512_linux_zip }}
steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v2
with:
python-version: '3.10'
- name: Install Requirements
run: |
sudo apt-get -y install zip pandoc man
python -m pip install --upgrade pip setuptools wheel twine
python -m pip install Pyinstaller -r requirements.txt
- name: Prepare
run: |
python devscripts/update-version.py ${{ needs.create_release.outputs.version_suffix }}
python devscripts/make_lazy_extractors.py
- name: Build Unix executables
run: |
make all tar
python pyinst.py --onedir
(cd ./dist/yt-dlp_linux && zip -r ../yt-dlp_linux.zip .)
python pyinst.py
- name: Get SHA2-SUMS
id: get_sha
run: |
echo "::set-output name=sha256_bin::$(sha256sum yt-dlp | awk '{print $1}')"
echo "::set-output name=sha512_bin::$(sha512sum yt-dlp | awk '{print $1}')"
echo "::set-output name=sha256_tar::$(sha256sum yt-dlp.tar.gz | awk '{print $1}')"
echo "::set-output name=sha512_tar::$(sha512sum yt-dlp.tar.gz | awk '{print $1}')"
echo "::set-output name=sha256_linux::$(sha256sum dist/yt-dlp_linux | awk '{print $1}')"
echo "::set-output name=sha512_linux::$(sha512sum dist/yt-dlp_linux | awk '{print $1}')"
echo "::set-output name=sha256_linux_zip::$(sha256sum dist/yt-dlp_linux.zip | awk '{print $1}')"
echo "::set-output name=sha512_linux_zip::$(sha512sum dist/yt-dlp_linux.zip | awk '{print $1}')"
- name: Upload zip binary
uses: actions/upload-release-asset@v1
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
upload_url: ${{ steps.create_release.outputs.upload_url }}
upload_url: ${{ needs.create_release.outputs.upload_url }}
asset_path: ./yt-dlp
asset_name: yt-dlp
asset_content_type: application/octet-stream
@ -144,270 +128,269 @@ jobs:
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
upload_url: ${{ steps.create_release.outputs.upload_url }}
upload_url: ${{ needs.create_release.outputs.upload_url }}
asset_path: ./yt-dlp.tar.gz
asset_name: yt-dlp.tar.gz
asset_content_type: application/gzip
- name: Upload standalone binary
uses: actions/upload-release-asset@v1
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
upload_url: ${{ needs.create_release.outputs.upload_url }}
asset_path: ./dist/yt-dlp_linux
asset_name: yt-dlp_linux
asset_content_type: application/octet-stream
- name: Upload onedir binary
uses: actions/upload-release-asset@v1
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
upload_url: ${{ needs.create_release.outputs.upload_url }}
asset_path: ./dist/yt-dlp_linux.zip
asset_name: yt-dlp_linux.zip
asset_content_type: application/zip
- name: Build and publish on PyPi
env:
TWINE_USERNAME: __token__
TWINE_PASSWORD: ${{ secrets.PYPI_TOKEN }}
if: "env.TWINE_PASSWORD != ''"
run: |
rm -rf dist/*
python setup.py sdist bdist_wheel
twine upload dist/*
- name: Install SSH private key for Homebrew
env:
BREW_TOKEN: ${{ secrets.BREW_TOKEN }}
if: "env.BREW_TOKEN != ''"
uses: yt-dlp/ssh-agent@v0.5.3
with:
ssh-private-key: ${{ env.BREW_TOKEN }}
- name: Update Homebrew Formulae
env:
BREW_TOKEN: ${{ secrets.BREW_TOKEN }}
if: "env.BREW_TOKEN != ''"
run: |
git clone git@github.com:yt-dlp/homebrew-taps taps/
python devscripts/update-formulae.py taps/Formula/yt-dlp.rb "${{ needs.create_release.outputs.ytdlp_version }}"
git -C taps/ config user.name github-actions
git -C taps/ config user.email github-actions@example.com
git -C taps/ commit -am 'yt-dlp: ${{ needs.create_release.outputs.ytdlp_version }}'
git -C taps/ push
build_macos:
runs-on: macos-11
needs: build_unix
needs: create_release
outputs:
sha256_macos: ${{ steps.sha256_macos.outputs.sha256_macos }}
sha512_macos: ${{ steps.sha512_macos.outputs.sha512_macos }}
sha256_macos_zip: ${{ steps.sha256_macos_zip.outputs.sha256_macos_zip }}
sha512_macos_zip: ${{ steps.sha512_macos_zip.outputs.sha512_macos_zip }}
sha256_macos: ${{ steps.get_sha.outputs.sha256_macos }}
sha512_macos: ${{ steps.get_sha.outputs.sha512_macos }}
sha256_macos_zip: ${{ steps.get_sha.outputs.sha256_macos_zip }}
sha512_macos_zip: ${{ steps.get_sha.outputs.sha512_macos_zip }}
steps:
- uses: actions/checkout@v2
# In order to create a universal2 application, the version of python3 in /usr/bin has to be used
# NB: In order to create a universal2 application, the version of python3 in /usr/bin has to be used
- name: Install Requirements
run: |
brew install coreutils
/usr/bin/python3 -m pip install -U --user pip Pyinstaller==4.10 -r requirements.txt
- name: Bump version
id: bump_version
run: /usr/bin/python3 devscripts/update-version.py
- name: Build lazy extractors
id: lazy_extractors
run: /usr/bin/python3 devscripts/make_lazy_extractors.py
- name: Run PyInstaller Script
run: /usr/bin/python3 pyinst.py --target-architecture universal2 --onefile
- name: Upload yt-dlp MacOS binary
id: upload-release-macos
/usr/bin/python3 -m pip install -U --user pip Pyinstaller -r requirements.txt
- name: Prepare
run: |
/usr/bin/python3 devscripts/update-version.py ${{ needs.create_release.outputs.version_suffix }}
/usr/bin/python3 devscripts/make_lazy_extractors.py
- name: Build
run: |
/usr/bin/python3 pyinst.py --target-architecture universal2 --onedir
(cd ./dist/yt-dlp_macos && zip -r ../yt-dlp_macos.zip .)
/usr/bin/python3 pyinst.py --target-architecture universal2
- name: Get SHA2-SUMS
id: get_sha
run: |
echo "::set-output name=sha256_macos::$(sha256sum dist/yt-dlp_macos | awk '{print $1}')"
echo "::set-output name=sha512_macos::$(sha512sum dist/yt-dlp_macos | awk '{print $1}')"
echo "::set-output name=sha256_macos_zip::$(sha256sum dist/yt-dlp_macos.zip | awk '{print $1}')"
echo "::set-output name=sha512_macos_zip::$(sha512sum dist/yt-dlp_macos.zip | awk '{print $1}')"
- name: Upload standalone binary
uses: actions/upload-release-asset@v1
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
upload_url: ${{ needs.build_unix.outputs.upload_url }}
upload_url: ${{ needs.create_release.outputs.upload_url }}
asset_path: ./dist/yt-dlp_macos
asset_name: yt-dlp_macos
asset_content_type: application/octet-stream
- name: Get SHA2-256SUMS for yt-dlp_macos
id: sha256_macos
run: echo "::set-output name=sha256_macos::$(sha256sum dist/yt-dlp_macos | awk '{print $1}')"
- name: Get SHA2-512SUMS for yt-dlp_macos
id: sha512_macos
run: echo "::set-output name=sha512_macos::$(sha512sum dist/yt-dlp_macos | awk '{print $1}')"
- name: Run PyInstaller Script with --onedir
run: |
/usr/bin/python3 pyinst.py --target-architecture universal2 --onedir
zip ./dist/yt-dlp_macos.zip ./dist/yt-dlp_macos
- name: Upload yt-dlp MacOS onedir
id: upload-release-macos-zip
- name: Upload onedir binary
uses: actions/upload-release-asset@v1
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
upload_url: ${{ needs.build_unix.outputs.upload_url }}
upload_url: ${{ needs.create_release.outputs.upload_url }}
asset_path: ./dist/yt-dlp_macos.zip
asset_name: yt-dlp_macos.zip
asset_content_type: application/zip
- name: Get SHA2-256SUMS for yt-dlp_macos.zip
id: sha256_macos_zip
run: echo "::set-output name=sha256_macos_zip::$(sha256sum dist/yt-dlp_macos.zip | awk '{print $1}')"
- name: Get SHA2-512SUMS for yt-dlp_macos.zip
id: sha512_macos_zip
run: echo "::set-output name=sha512_macos_zip::$(sha512sum dist/yt-dlp_macos.zip | awk '{print $1}')"
build_windows:
runs-on: windows-latest
needs: build_unix
needs: create_release
outputs:
sha256_win: ${{ steps.sha256_win.outputs.sha256_win }}
sha512_win: ${{ steps.sha512_win.outputs.sha512_win }}
sha256_py2exe: ${{ steps.sha256_py2exe.outputs.sha256_py2exe }}
sha512_py2exe: ${{ steps.sha512_py2exe.outputs.sha512_py2exe }}
sha256_win_zip: ${{ steps.sha256_win_zip.outputs.sha256_win_zip }}
sha512_win_zip: ${{ steps.sha512_win_zip.outputs.sha512_win_zip }}
sha256_win: ${{ steps.get_sha.outputs.sha256_win }}
sha512_win: ${{ steps.get_sha.outputs.sha512_win }}
sha256_py2exe: ${{ steps.get_sha.outputs.sha256_py2exe }}
sha512_py2exe: ${{ steps.get_sha.outputs.sha512_py2exe }}
sha256_win_zip: ${{ steps.get_sha.outputs.sha256_win_zip }}
sha512_win_zip: ${{ steps.get_sha.outputs.sha512_win_zip }}
steps:
- uses: actions/checkout@v2
# 3.8 is used for Win7 support
- name: Set up Python 3.8
uses: actions/setup-python@v2
with:
- uses: actions/setup-python@v2
with: # 3.8 is used for Win7 support
python-version: '3.8'
- name: Install Requirements
# Custom pyinstaller built with https://github.com/yt-dlp/pyinstaller-builds
run: |
run: | # Custom pyinstaller built with https://github.com/yt-dlp/pyinstaller-builds
python -m pip install --upgrade pip setuptools wheel py2exe
pip install "https://yt-dlp.github.io/Pyinstaller-Builds/x86_64/pyinstaller-4.10-py3-none-any.whl" -r requirements.txt
- name: Bump version
id: bump_version
env:
version_suffix: ${{ needs.build_unix.outputs.version_suffix }}
run: python devscripts/update-version.py ${{ env.version_suffix }}
- name: Build lazy extractors
id: lazy_extractors
run: python devscripts/make_lazy_extractors.py
- name: Run PyInstaller Script
run: python pyinst.py
- name: Upload yt-dlp.exe Windows binary
id: upload-release-windows
- name: Prepare
run: |
python devscripts/update-version.py ${{ needs.create_release.outputs.version_suffix }}
python devscripts/make_lazy_extractors.py
- name: Build
run: |
python setup.py py2exe
Move-Item ./dist/yt-dlp.exe ./dist/yt-dlp_min.exe
python pyinst.py
python pyinst.py --onedir
Compress-Archive -Path ./dist/yt-dlp/* -DestinationPath ./dist/yt-dlp_win.zip
- name: Get SHA2-SUMS
id: get_sha
run: |
echo "::set-output name=sha256_py2exe::$((Get-FileHash dist\yt-dlp_min.exe -Algorithm SHA256).Hash.ToLower())"
echo "::set-output name=sha512_py2exe::$((Get-FileHash dist\yt-dlp_min.exe -Algorithm SHA512).Hash.ToLower())"
echo "::set-output name=sha256_win::$((Get-FileHash dist\yt-dlp.exe -Algorithm SHA256).Hash.ToLower())"
echo "::set-output name=sha512_win::$((Get-FileHash dist\yt-dlp.exe -Algorithm SHA512).Hash.ToLower())"
echo "::set-output name=sha256_win_zip::$((Get-FileHash dist\yt-dlp_win.zip -Algorithm SHA256).Hash.ToLower())"
echo "::set-output name=sha512_win_zip::$((Get-FileHash dist\yt-dlp_win.zip -Algorithm SHA512).Hash.ToLower())"
- name: Upload py2exe binary
uses: actions/upload-release-asset@v1
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
upload_url: ${{ needs.build_unix.outputs.upload_url }}
upload_url: ${{ needs.create_release.outputs.upload_url }}
asset_path: ./dist/yt-dlp_min.exe
asset_name: yt-dlp_min.exe
asset_content_type: application/vnd.microsoft.portable-executable
- name: Upload standalone binary
uses: actions/upload-release-asset@v1
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
upload_url: ${{ needs.create_release.outputs.upload_url }}
asset_path: ./dist/yt-dlp.exe
asset_name: yt-dlp.exe
asset_content_type: application/vnd.microsoft.portable-executable
- name: Get SHA2-256SUMS for yt-dlp.exe
id: sha256_win
run: echo "::set-output name=sha256_win::$((Get-FileHash dist\yt-dlp.exe -Algorithm SHA256).Hash.ToLower())"
- name: Get SHA2-512SUMS for yt-dlp.exe
id: sha512_win
run: echo "::set-output name=sha512_win::$((Get-FileHash dist\yt-dlp.exe -Algorithm SHA512).Hash.ToLower())"
- name: Run PyInstaller Script with --onedir
run: |
python pyinst.py --onedir
Compress-Archive -LiteralPath ./dist/yt-dlp -DestinationPath ./dist/yt-dlp_win.zip
- name: Upload yt-dlp Windows onedir
id: upload-release-windows-zip
- name: Upload onedir binary
uses: actions/upload-release-asset@v1
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
upload_url: ${{ needs.build_unix.outputs.upload_url }}
upload_url: ${{ needs.create_release.outputs.upload_url }}
asset_path: ./dist/yt-dlp_win.zip
asset_name: yt-dlp_win.zip
asset_content_type: application/zip
- name: Get SHA2-256SUMS for yt-dlp_win.zip
id: sha256_win_zip
run: echo "::set-output name=sha256_win_zip::$((Get-FileHash dist\yt-dlp_win.zip -Algorithm SHA256).Hash.ToLower())"
- name: Get SHA2-512SUMS for yt-dlp_win.zip
id: sha512_win_zip
run: echo "::set-output name=sha512_win_zip::$((Get-FileHash dist\yt-dlp_win.zip -Algorithm SHA512).Hash.ToLower())"
- name: Run py2exe Script
run: python setup.py py2exe
- name: Upload yt-dlp_min.exe Windows binary
id: upload-release-windows-py2exe
uses: actions/upload-release-asset@v1
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
upload_url: ${{ needs.build_unix.outputs.upload_url }}
asset_path: ./dist/yt-dlp.exe
asset_name: yt-dlp_min.exe
asset_content_type: application/vnd.microsoft.portable-executable
- name: Get SHA2-256SUMS for yt-dlp_min.exe
id: sha256_py2exe
run: echo "::set-output name=sha256_py2exe::$((Get-FileHash dist\yt-dlp.exe -Algorithm SHA256).Hash.ToLower())"
- name: Get SHA2-512SUMS for yt-dlp_min.exe
id: sha512_py2exe
run: echo "::set-output name=sha512_py2exe::$((Get-FileHash dist\yt-dlp.exe -Algorithm SHA512).Hash.ToLower())"
build_windows32:
runs-on: windows-latest
needs: build_unix
needs: create_release
outputs:
sha256_win32: ${{ steps.sha256_win32.outputs.sha256_win32 }}
sha512_win32: ${{ steps.sha512_win32.outputs.sha512_win32 }}
sha256_win32: ${{ steps.get_sha.outputs.sha256_win32 }}
sha512_win32: ${{ steps.get_sha.outputs.sha512_win32 }}
steps:
- uses: actions/checkout@v2
# 3.7 is used for Vista support. See https://github.com/yt-dlp/yt-dlp/issues/390
- name: Set up Python 3.7 32-Bit
uses: actions/setup-python@v2
with:
- uses: actions/setup-python@v2
with: # 3.7 is used for Vista support. See https://github.com/yt-dlp/yt-dlp/issues/390
python-version: '3.7'
architecture: 'x86'
- name: Install Requirements
run: |
python -m pip install --upgrade pip setuptools wheel
pip install "https://yt-dlp.github.io/Pyinstaller-Builds/i686/pyinstaller-4.10-py3-none-any.whl" -r requirements.txt
- name: Bump version
id: bump_version
env:
version_suffix: ${{ needs.build_unix.outputs.version_suffix }}
run: python devscripts/update-version.py ${{ env.version_suffix }}
- name: Build lazy extractors
id: lazy_extractors
run: python devscripts/make_lazy_extractors.py
- name: Run PyInstaller Script for 32 Bit
run: python pyinst.py
- name: Upload Executable yt-dlp_x86.exe
id: upload-release-windows32
- name: Prepare
run: |
python devscripts/update-version.py ${{ needs.create_release.outputs.version_suffix }}
python devscripts/make_lazy_extractors.py
- name: Build
run: |
python pyinst.py
- name: Get SHA2-SUMS
id: get_sha
run: |
echo "::set-output name=sha256_win32::$((Get-FileHash dist\yt-dlp_x86.exe -Algorithm SHA256).Hash.ToLower())"
echo "::set-output name=sha512_win32::$((Get-FileHash dist\yt-dlp_x86.exe -Algorithm SHA512).Hash.ToLower())"
- name: Upload standalone binary
uses: actions/upload-release-asset@v1
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
upload_url: ${{ needs.build_unix.outputs.upload_url }}
upload_url: ${{ needs.create_release.outputs.upload_url }}
asset_path: ./dist/yt-dlp_x86.exe
asset_name: yt-dlp_x86.exe
asset_content_type: application/vnd.microsoft.portable-executable
- name: Get SHA2-256SUMS for yt-dlp_x86.exe
id: sha256_win32
run: echo "::set-output name=sha256_win32::$((Get-FileHash dist\yt-dlp_x86.exe -Algorithm SHA256).Hash.ToLower())"
- name: Get SHA2-512SUMS for yt-dlp_x86.exe
id: sha512_win32
run: echo "::set-output name=sha512_win32::$((Get-FileHash dist\yt-dlp_x86.exe -Algorithm SHA512).Hash.ToLower())"
finish:
runs-on: ubuntu-latest
needs: [build_unix, build_windows, build_windows32, build_macos]
needs: [create_release, build_unix, build_windows, build_windows32, build_macos]
steps:
- name: Make SHA2-256SUMS file
env:
SHA256_BIN: ${{ needs.build_unix.outputs.sha256_bin }}
SHA256_TAR: ${{ needs.build_unix.outputs.sha256_tar }}
SHA256_WIN: ${{ needs.build_windows.outputs.sha256_win }}
SHA256_PY2EXE: ${{ needs.build_windows.outputs.sha256_py2exe }}
SHA256_WIN_ZIP: ${{ needs.build_windows.outputs.sha256_win_zip }}
SHA256_WIN32: ${{ needs.build_windows32.outputs.sha256_win32 }}
SHA256_MACOS: ${{ needs.build_macos.outputs.sha256_macos }}
SHA256_MACOS_ZIP: ${{ needs.build_macos.outputs.sha256_macos_zip }}
- name: Make SHA2-SUMS files
run: |
echo "${{ env.SHA256_BIN }} yt-dlp" >> SHA2-256SUMS
echo "${{ env.SHA256_TAR }} yt-dlp.tar.gz" >> SHA2-256SUMS
echo "${{ env.SHA256_WIN }} yt-dlp.exe" >> SHA2-256SUMS
echo "${{ env.SHA256_PY2EXE }} yt-dlp_min.exe" >> SHA2-256SUMS
echo "${{ env.SHA256_WIN32 }} yt-dlp_x86.exe" >> SHA2-256SUMS
echo "${{ env.SHA256_WIN_ZIP }} yt-dlp_win.zip" >> SHA2-256SUMS
echo "${{ env.SHA256_MACOS }} yt-dlp_macos" >> SHA2-256SUMS
echo "${{ env.SHA256_MACOS_ZIP }} yt-dlp_macos.zip" >> SHA2-256SUMS
- name: Upload 256SUMS file
id: upload-sums
echo "${{ needs.build_unix.outputs.sha256_bin }} yt-dlp" >> SHA2-256SUMS
echo "${{ needs.build_unix.outputs.sha256_tar }} yt-dlp.tar.gz" >> SHA2-256SUMS
echo "${{ needs.build_unix.outputs.sha256_linux }} yt-dlp_linux" >> SHA2-256SUMS
echo "${{ needs.build_unix.outputs.sha256_linux_zip }} yt-dlp_linux.zip" >> SHA2-256SUMS
echo "${{ needs.build_windows.outputs.sha256_win }} yt-dlp.exe" >> SHA2-256SUMS
echo "${{ needs.build_windows.outputs.sha256_py2exe }} yt-dlp_min.exe" >> SHA2-256SUMS
echo "${{ needs.build_windows32.outputs.sha256_win32 }} yt-dlp_x86.exe" >> SHA2-256SUMS
echo "${{ needs.build_windows.outputs.sha256_win_zip }} yt-dlp_win.zip" >> SHA2-256SUMS
echo "${{ needs.build_macos.outputs.sha256_macos }} yt-dlp_macos" >> SHA2-256SUMS
echo "${{ needs.build_macos.outputs.sha256_macos_zip }} yt-dlp_macos.zip" >> SHA2-256SUMS
echo "${{ needs.build_unix.outputs.sha512_bin }} yt-dlp" >> SHA2-512SUMS
echo "${{ needs.build_unix.outputs.sha512_tar }} yt-dlp.tar.gz" >> SHA2-512SUMS
echo "${{ needs.build_unix.outputs.sha512_linux }} yt-dlp_linux" >> SHA2-512SUMS
echo "${{ needs.build_unix.outputs.sha512_linux_zip }} yt-dlp_linux.zip" >> SHA2-512SUMS
echo "${{ needs.build_windows.outputs.sha512_win }} yt-dlp.exe" >> SHA2-512SUMS
echo "${{ needs.build_windows.outputs.sha512_py2exe }} yt-dlp_min.exe" >> SHA2-512SUMS
echo "${{ needs.build_windows32.outputs.sha512_win32 }} yt-dlp_x86.exe" >> SHA2-512SUMS
echo "${{ needs.build_windows.outputs.sha512_win_zip }} yt-dlp_win.zip" >> SHA2-512SUMS
echo "${{ needs.build_macos.outputs.sha512_macos }} yt-dlp_macos" >> SHA2-512SUMS
echo "${{ needs.build_macos.outputs.sha512_macos_zip }} yt-dlp_macos.zip" >> SHA2-512SUMS
- name: Upload SHA2-256SUMS file
uses: actions/upload-release-asset@v1
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
upload_url: ${{ needs.build_unix.outputs.upload_url }}
upload_url: ${{ needs.create_release.outputs.upload_url }}
asset_path: ./SHA2-256SUMS
asset_name: SHA2-256SUMS
asset_content_type: text/plain
- name: Make SHA2-512SUMS file
env:
SHA512_BIN: ${{ needs.build_unix.outputs.sha512_bin }}
SHA512_TAR: ${{ needs.build_unix.outputs.sha512_tar }}
SHA512_WIN: ${{ needs.build_windows.outputs.sha512_win }}
SHA512_PY2EXE: ${{ needs.build_windows.outputs.sha512_py2exe }}
SHA512_WIN_ZIP: ${{ needs.build_windows.outputs.sha512_win_zip }}
SHA512_WIN32: ${{ needs.build_windows32.outputs.sha512_win32 }}
SHA512_MACOS: ${{ needs.build_macos.outputs.sha512_macos }}
SHA512_MACOS_ZIP: ${{ needs.build_macos.outputs.sha512_macos_zip }}
run: |
echo "${{ env.SHA512_BIN }} yt-dlp" >> SHA2-512SUMS
echo "${{ env.SHA512_TAR }} yt-dlp.tar.gz" >> SHA2-512SUMS
echo "${{ env.SHA512_WIN }} yt-dlp.exe" >> SHA2-512SUMS
echo "${{ env.SHA512_WIN_ZIP }} yt-dlp_win.zip" >> SHA2-512SUMS
echo "${{ env.SHA512_PY2EXE }} yt-dlp_min.exe" >> SHA2-512SUMS
echo "${{ env.SHA512_WIN32 }} yt-dlp_x86.exe" >> SHA2-512SUMS
echo "${{ env.SHA512_MACOS }} yt-dlp_macos" >> SHA2-512SUMS
echo "${{ env.SHA512_MACOS_ZIP }} yt-dlp_macos.zip" >> SHA2-512SUMS
- name: Upload 512SUMS file
id: upload-512sums
- name: Upload SHA2-512SUMS file
uses: actions/upload-release-asset@v1
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
upload_url: ${{ needs.build_unix.outputs.upload_url }}
upload_url: ${{ needs.create_release.outputs.upload_url }}
asset_path: ./SHA2-512SUMS
asset_name: SHA2-512SUMS
asset_content_type: text/plain

View File

@ -10,12 +10,15 @@ jobs:
matrix:
os: [ubuntu-latest]
# CPython 3.9 is in quick-test
python-version: ['3.6', '3.7', '3.10', 3.11-dev, pypy-3.6, pypy-3.7, pypy-3.8, pypy-3.9]
python-version: ['3.6', '3.7', '3.10', 3.11-dev, pypy-3.6, pypy-3.7, pypy-3.8]
run-tests-ext: [sh]
include:
# atleast one of the tests must be in windows
# atleast one of each CPython/PyPy tests must be in windows
- os: windows-latest
python-version: 3.8
python-version: '3.8'
run-tests-ext: bat
- os: windows-latest
python-version: pypy-3.9
run-tests-ext: bat
steps:
- uses: actions/checkout@v2

View File

@ -9,11 +9,15 @@ jobs:
fail-fast: true
matrix:
os: [ubuntu-latest]
python-version: ['3.6', '3.7', '3.9', '3.10', 3.11-dev, pypy-3.6, pypy-3.7, pypy-3.8, pypy-3.9]
python-version: ['3.6', '3.7', '3.9', '3.10', 3.11-dev, pypy-3.6, pypy-3.7, pypy-3.8]
run-tests-ext: [sh]
include:
# atleast one of each CPython/PyPy tests must be in windows
- os: windows-latest
python-version: 3.8
python-version: '3.8'
run-tests-ext: bat
- os: windows-latest
python-version: pypy-3.9
run-tests-ext: bat
steps:
- uses: actions/checkout@v2

View File

@ -214,7 +214,7 @@ After you have ensured this site is distributing its content legally, you can fo
# TODO more properties (see yt_dlp/extractor/common.py)
}
```
1. Add an import in [`yt_dlp/extractor/extractors.py`](yt_dlp/extractor/extractors.py).
1. Add an import in [`yt_dlp/extractor/_extractors.py`](yt_dlp/extractor/_extractors.py). Note that the class name must end with `IE`.
1. Run `python test/test_download.py TestDownload.test_YourExtractor` (note that `YourExtractor` doesn't end with `IE`). This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, the tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc. Note that tests with `only_matching` key in test's dict are not counted in. You can also run all the tests in one go with `TestDownload.test_YourExtractor_all`
1. Make sure you have atleast one test for your extractor. Even if all videos covered by the extractor are expected to be inaccessible for automated testing, tests should still be added with a `skip` parameter indicating why the particular test is disabled from running.
1. Have a look at [`yt_dlp/extractor/common.py`](yt_dlp/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](yt_dlp/extractor/common.py#L91-L426). Add tests and code for as many as you want.
@ -225,7 +225,7 @@ After you have ensured this site is distributing its content legally, you can fo
1. Make sure your code works under all [Python](https://www.python.org/) versions supported by yt-dlp, namely CPython and PyPy for Python 3.6 and above. Backward compatibility is not required for even older versions of Python.
1. When the tests pass, [add](https://git-scm.com/docs/git-add) the new files, [commit](https://git-scm.com/docs/git-commit) them and [push](https://git-scm.com/docs/git-push) the result, like this:
$ git add yt_dlp/extractor/extractors.py
$ git add yt_dlp/extractor/_extractors.py
$ git add yt_dlp/extractor/yourextractor.py
$ git commit -m '[yourextractor] Add extractor'
$ git push origin yourextractor
@ -300,14 +300,10 @@ description = meta['summary'] # incorrect
The latter will break extraction process with `KeyError` if `summary` disappears from `meta` at some later time but with the former approach extraction will just go ahead with `description` set to `None` which is perfectly fine (remember `None` is equivalent to the absence of data).
If the data is nested, do not use `.get` chains, but instead make use of the utility functions `try_get` or `traverse_obj`
If the data is nested, do not use `.get` chains, but instead make use of `traverse_obj`.
Considering the above `meta` again, assume you want to extract `["user"]["name"]` and put it in the resulting info dict as `uploader`
```python
uploader = try_get(meta, lambda x: x['user']['name']) # correct
```
or
```python
uploader = traverse_obj(meta, ('user', 'name')) # correct
```
@ -321,6 +317,10 @@ or
```python
uploader = meta.get('user', {}).get('name') # incorrect
```
or
```python
uploader = try_get(meta, lambda x: x['user']['name']) # old utility
```
Similarly, you should pass `fatal=False` when extracting optional data from a webpage with `_search_regex`, `_html_search_regex` or similar methods, for instance:
@ -346,25 +346,25 @@ On failure this code will silently continue the extraction with `description` se
Another thing to remember is not to try to iterate over `None`
Say you extracted a list of thumbnails into `thumbnail_data` using `try_get` and now want to iterate over them
Say you extracted a list of thumbnails into `thumbnail_data` and want to iterate over them
```python
thumbnail_data = try_get(...)
thumbnail_data = data.get('thumbnails') or []
thumbnails = [{
'url': item['url']
} for item in thumbnail_data or []] # correct
} for item in thumbnail_data] # correct
```
and not like:
```python
thumbnail_data = try_get(...)
thumbnail_data = data.get('thumbnails')
thumbnails = [{
'url': item['url']
} for item in thumbnail_data] # incorrect
```
In the later case, `thumbnail_data` will be `None` if the field was not found and this will cause the loop `for item in thumbnail_data` to raise a fatal error. Using `for item in thumbnail_data or []` avoids this error and results in setting an empty list in `thumbnails` instead.
In this case, `thumbnail_data` will be `None` if the field was not found and this will cause the loop `for item in thumbnail_data` to raise a fatal error. Using `or []` avoids this error and results in setting an empty list in `thumbnails` instead.
### Provide fallbacks
@ -431,7 +431,7 @@ title = self._search_regex( # correct
r'<span[^>]+class="title"[^>]*>([^<]+)', webpage, 'title')
```
Or even better:
which tolerates potential changes in the `style` attribute's value. Or even better:
```python
title = self._search_regex( # correct
@ -439,7 +439,7 @@ title = self._search_regex( # correct
webpage, 'title', group='title')
```
Note how you tolerate potential changes in the `style` attribute's value or switch from using double quotes to single for `class` attribute:
which also handles both single quotes in addition to double quotes.
The code definitely should not look like:
@ -460,6 +460,41 @@ title = self._search_regex( # incorrect
Here the presence or absence of other attributes including `style` is irrelevent for the data we need, and so the regex must not depend on it
#### Keep the regular expressions as simple as possible, but no simpler
Since many extractors deal with unstructured data provided by websites, we will often need to use very complex regular expressions. You should try to use the *simplest* regex that can accomplish what you want. In other words, each part of the regex must have a reason for existing. If you can take out a symbol and the functionality does not change, the symbol should not be there.
##### Example
Correct:
```python
_VALID_URL = r'https?://(?:www\.)?website\.com/(?:[^/]+/){3,4}(?P<display_id>[^/]+)_(?P<id>\d+)'
```
Incorrect:
```python
_VALID_URL = r'https?:\/\/(?:www\.)?website\.com\/[^\/]+/[^\/]+/[^\/]+(?:\/[^\/]+)?\/(?P<display_id>[^\/]+)_(?P<id>\d+)'
```
#### Do not misuse `.` and use the correct quantifiers (`+*?`)
Avoid creating regexes that over-match because of wrong use of quantifiers. Also try to avoid non-greedy matching (`?`) where possible since they could easily result in [catastrophic backtracking](https://www.regular-expressions.info/catastrophic.html)
Correct:
```python
title = self._search_regex(r'<span\b[^>]+class="title"[^>]*>([^<]+)', webpage, 'title')
```
Incorrect:
```python
title = self._search_regex(r'<span\b.*class="title".*>(.+?)<', webpage, 'title')
```
### Long lines policy
There is a soft limit to keep lines of code under 100 characters long. This means it should be respected if possible and if it does not make readability and code maintenance worse. Sometimes, it may be reasonable to go upto 120 characters and sometimes even 80 can be unreadable. Keep in mind that this is not a hard limit and is just one of many tools to make the code more readable.
@ -521,19 +556,22 @@ formats = self._extract_m3u8_formats(m3u8_url,
### Quotes
Always use single quotes for strings (even if the string has `'`) and double quotes for docstrings. Use `'''` only for multi-line strings. An exception can be made if a string has multiple single quotes in it and escaping makes it significantly harder to read. For f-strings, use you can use double quotes on the inside. But avoid f-strings that have too many quotes inside.
Always use single quotes for strings (even if the string has `'`) and double quotes for docstrings. Use `'''` only for multi-line strings. An exception can be made if a string has multiple single quotes in it and escaping makes it *significantly* harder to read. For f-strings, use you can use double quotes on the inside. But avoid f-strings that have too many quotes inside.
### Inline values
Extracting variables is acceptable for reducing code duplication and improving readability of complex expressions. However, you should avoid extracting variables used only once and moving them to opposite parts of the extractor file, which makes reading the linear flow difficult.
#### Example
#### Examples
Correct:
```python
title = self._html_search_regex(r'<h1>([^<]+)</h1>', webpage, 'title')
return {
'title': self._html_search_regex(r'<h1>([^<]+)</h1>', webpage, 'title'),
# ...some lines of code...
}
```
Incorrect:
@ -542,6 +580,11 @@ Incorrect:
TITLE_RE = r'<h1>([^<]+)</h1>'
# ...some lines of code...
title = self._html_search_regex(TITLE_RE, webpage, 'title')
# ...some lines of code...
return {
'title': title,
# ...some lines of code...
}
```
@ -573,33 +616,32 @@ Methods supporting list of patterns are: `_search_regex`, `_html_search_regex`,
### Trailing parentheses
Always move trailing parentheses used for grouping/functions after the last argument. On the other hand, literal list/tuple/dict/set should closed be in a new line. Generators and list/dict comprehensions may use either style
Always move trailing parentheses used for grouping/functions after the last argument. On the other hand, multi-line literal list/tuple/dict/set should closed be in a new line. Generators and list/dict comprehensions may use either style
#### Examples
Correct:
```python
url = try_get(
info,
lambda x: x['ResultSet']['Result'][0]['VideoUrlSet']['VideoUrl'],
list)
url = traverse_obj(info, (
'context', 'dispatcher', 'stores', 'VideoTitlePageStore', 'data', 'video', 0, 'VideoUrlSet', 'VideoUrl'), list)
```
Correct:
```python
url = try_get(info,
lambda x: x['ResultSet']['Result'][0]['VideoUrlSet']['VideoUrl'],
url = traverse_obj(
info,
('context', 'dispatcher', 'stores', 'VideoTitlePageStore', 'data', 'video', 0, 'VideoUrlSet', 'VideoUrl'),
list)
```
Incorrect:
```python
url = try_get(
url = traverse_obj(
info,
lambda x: x['ResultSet']['Result'][0]['VideoUrlSet']['VideoUrl'],
list,
('context', 'dispatcher', 'stores', 'VideoTitlePageStore', 'data', 'video', 0, 'VideoUrlSet', 'VideoUrl'),
list
)
```
@ -648,21 +690,17 @@ Use `unified_strdate` for uniform `upload_date` or any `YYYYMMDD` meta field ext
Explore [`yt_dlp/utils.py`](yt_dlp/utils.py) for more useful convenience functions.
#### More examples
#### Examples
##### Safely extract optional description from parsed JSON
```python
description = traverse_obj(response, ('result', 'video', 'summary'), expected_type=str)
```
##### Safely extract more optional metadata
```python
thumbnails = traverse_obj(response, ('result', 'thumbnails', ..., 'url'), expected_type=url_or_none)
video = traverse_obj(response, ('result', 'video', 0), default={}, expected_type=dict)
description = video.get('summary')
duration = float_or_none(video.get('durationMs'), scale=1000)
view_count = int_or_none(video.get('views'))
```
# My pull request is labeled pending-fixes
The `pending-fixes` label is added when there are changes requested to a PR. When the necessary changes are made, the label should be removed. However, despite our best efforts, it may sometimes happen that the maintainer did not see the changes or forgot to remove the label. If your PR is still marked as `pending-fixes` a few days after all requested changes have been made, feel free to ping the maintainer who labeled your issue and ask them to re-review and remove the label.

View File

@ -248,3 +248,22 @@ rand-net
vertan
Wikidepia
Yipten
moench-tegeder
christoph-heinrich
HobbyistDev
LunarFang416
sbor23
aurelg
adamanldo
gamer191
vkorablin
Burve
mnn
ZhymabekRoman
mozbugbox
aejdl
ping
sqrtNOT
bubbleguuum
darkxex
miseran

View File

@ -11,6 +11,131 @@
-->
### 2022.06.22.1
* [build] Fix updating homebrew formula
### 2022.06.22
* [**Deprecate support for Python 3.6**](https://github.com/yt-dlp/yt-dlp/issues/3764#issuecomment-1154051119)
* **Add option `--download-sections` to download video partially**
* Chapter regex and time ranges are accepted (Eg: `--download-sections *1:10-2:20`)
* Add option `--alias`
* Add option `--lazy-playlist` to process entries as they are received
* Add option `--retry-sleep`
* Add slicing notation to `--playlist-items`
* Adds support for negative indices and step
* Add `-I` as alias for `--playlist-index`
* Makes `--playlist-start`, `--playlist-end`, `--playlist-reverse`, `--no-playlist-reverse` redundant
* `--config-location -` to provide options interactively
* [build] Add Linux standalone builds
* [update] Self-restart after update
* Merge youtube-dl: Upto [commit/8a158a9](https://github.com/ytdl-org/youtube-dl/commit/8a158a9)
* Add `--no-update`
* Allow extractors to specify section_start/end for clips
* Do not print progress to `stderr` with `-q`
* Ensure pre-processor errors do not block video download
* Fix `--simulate --max-downloads`
* Improve error handling of bad config files
* Return an error code if update fails
* Fix bug in [3a408f9](https://github.com/yt-dlp/yt-dlp/commit/3a408f9d199127ca2626359e21a866a09ab236b3)
* [ExtractAudio] Allow conditional conversion
* [ModifyChapters] Fix repeated removal of small segments
* [ThumbnailsConvertor] Allow conditional conversion
* [cookies] Detect profiles for cygwin/BSD by [moench-tegeder](https://github.com/moench-tegeder)
* [dash] Show fragment count with `--live-from-start` by [flashdagger](https://github.com/flashdagger)
* [extractor] Add `_search_json` by [coletdjnz](https://github.com/coletdjnz), [pukkandan](https://github.com/pukkandan)
* [extractor] Add `default` parameter to `_search_json` by [coletdjnz](https://github.com/coletdjnz), [pukkandan](https://github.com/pukkandan)
* [extractor] Add dev option `--load-pages`
* [extractor] Handle `json_ld` with multiple `@type`s
* [extractor] Import `_ALL_CLASSES` lazily
* [extractor] Recognize `src` attribute from HTML5 media elements by [Lesmiscore](https://github.com/Lesmiscore)
* [extractor/generic] Revert e6ae51c123897927eb3c9899923d8ffd31c7f85d
* [f4m] Bugfix
* [ffmpeg] Check version lazily
* [jsinterp] Some optimizations and refactoring by [dirkf](https://github.com/dirkf), [pukkandan](https://github.com/pukkandan)
* [utils] Improve performance using `functools.cache`
* [utils] Send HTTP/1.1 ALPN extension by [coletdjnz](https://github.com/coletdjnz)
* [utils] `ExtractorError`: Fix `exc_info`
* [utils] `ISO3166Utils`: Add `EU` and `AP`
* [utils] `Popen`: Refactor to use contextmanager
* [utils] `locked_file`: Fix for PyPy on Windows
* [update] Expose more functionality to API
* [update] Use `.git` folder to distinguish `source`/`unknown`
* [compat] Add `functools.cached_property`
* [test] Fix `FakeYDL` signatures by [coletdjnz](https://github.com/coletdjnz)
* [docs] Improvements
* [cleanup, ExtractAudio] Refactor
* [cleanup, downloader] Refactor `report_progress`
* [cleanup, extractor] Refactor `_download_...` methods
* [cleanup, extractor] Rename `extractors.py` to `_extractors.py`
* [cleanup, utils] Don't use kwargs for `format_field`
* [cleanup, build] Refactor
* [cleanup, docs] Re-indent "Usage and Options" section
* [cleanup] Deprecate `YoutubeDL.parse_outtmpl`
* [cleanup] Misc fixes and cleanup by [Lesmiscore](https://github.com/Lesmiscore), [MrRawes](https://github.com/MrRawes), [christoph-heinrich](https://github.com/christoph-heinrich), [flashdagger](https://github.com/flashdagger), [gamer191](https://github.com/gamer191), [kwconder](https://github.com/kwconder), [pukkandan](https://github.com/pukkandan)
* [extractor/DailyWire] Add extractors by [HobbyistDev](https://github.com/HobbyistDev), [pukkandan](https://github.com/pukkandan)
* [extractor/fourzerostudio] Add extractors by [Lesmiscore](https://github.com/Lesmiscore)
* [extractor/GoogleDrive] Add folder extractor by [evansp](https://github.com/evansp), [pukkandan](https://github.com/pukkandan)
* [extractor/MirrorCoUK] Add extractor by [LunarFang416](https://github.com/LunarFang416), [pukkandan](https://github.com/pukkandan)
* [extractor/atscaleconfevent] Add extractor by [Ashish0804](https://github.com/Ashish0804)
* [extractor/freetv] Add extractor by [elyse0](https://github.com/elyse0)
* [extractor/ixigua] Add Extractor by [HobbyistDev](https://github.com/HobbyistDev)
* [extractor/kicker.de] Add extractor by [HobbyistDev](https://github.com/HobbyistDev)
* [extractor/netverse] Add extractors by [HobbyistDev](https://github.com/HobbyistDev), [pukkandan](https://github.com/pukkandan)
* [extractor/playsuisse] Add extractor by [pukkandan](https://github.com/pukkandan), [sbor23](https://github.com/sbor23)
* [extractor/substack] Add extractor by [elyse0](https://github.com/elyse0)
* [extractor/youtube] **Support downloading clips**
* [extractor/youtube] Add `innertube_host` and `innertube_key` extractor args by [coletdjnz](https://github.com/coletdjnz)
* [extractor/youtube] Add warning for PostLiveDvr
* [extractor/youtube] Bring back `_extract_chapters_from_description`
* [extractor/youtube] Extract `comment_count` from webpage
* [extractor/youtube] Fix `:ytnotifications` extractor by [coletdjnz](https://github.com/coletdjnz)
* [extractor/youtube] Fix initial player response extraction by [coletdjnz](https://github.com/coletdjnz), [pukkandan](https://github.com/pukkandan)
* [extractor/youtube] Fix live chat for videos with content warning by [coletdjnz](https://github.com/coletdjnz)
* [extractor/youtube] Make signature extraction non-fatal
* [extractor/youtube:tab] Detect `videoRenderer` in `_post_thread_continuation_entries`
* [extractor/BiliIntl] Fix metadata extraction
* [extractor/BiliIntl] Fix subtitle extraction by [HobbyistDev](https://github.com/HobbyistDev)
* [extractor/FranceCulture] Fix extractor by [aurelg](https://github.com/aurelg), [pukkandan](https://github.com/pukkandan)
* [extractor/PokemonSoundLibrary] Remove extractor by [Lesmiscore](https://github.com/Lesmiscore)
* [extractor/StreamCZ] Fix extractor by [adamanldo](https://github.com/adamanldo), [dirkf](https://github.com/dirkf)
* [extractor/WatchESPN] Support free videos and BAM_DTC by [ischmidt20](https://github.com/ischmidt20)
* [extractor/animelab] Remove extractor by [gamer191](https://github.com/gamer191)
* [extractor/bloomberg] Change playback endpoint by [m4tu4g](https://github.com/m4tu4g)
* [extractor/ccc] Extract view_count by [vkorablin](https://github.com/vkorablin)
* [extractor/crunchyroll:beta] Fix extractor after API change by [Burve](https://github.com/Burve), [tejing1](https://github.com/tejing1)
* [extractor/curiositystream] Get `auth_token` from cookie by [mnn](https://github.com/mnn)
* [extractor/digitalconcerthall] Fix extractor by [ZhymabekRoman](https://github.com/ZhymabekRoman)
* [extractor/dropbox] Extract the correct `mountComponent`
* [extractor/dropout] Login is not mandatory
* [extractor/duboku] Fix for hostname change by [mozbugbox](https://github.com/mozbugbox)
* [extractor/espn] Add `WatchESPN` extractor by [ischmidt20](https://github.com/ischmidt20), [pukkandan](https://github.com/pukkandan)
* [extractor/expressen] Fix extractor by [aejdl](https://github.com/aejdl)
* [extractor/foxnews] Update embed extraction by [elyse0](https://github.com/elyse0)
* [extractor/ina] Fix extractor by [elyse0](https://github.com/elyse0)
* [extractor/iwara:user] Make paging better by [Lesmiscore](https://github.com/Lesmiscore)
* [extractor/jwplatform] Look for `data-video-jw-id`
* [extractor/lbry] Update livestream API by [flashdagger](https://github.com/flashdagger)
* [extractor/mediaset] Improve `_VALID_URL`
* [extractor/naver] Add `navernow` extractor by [ping](https://github.com/ping)
* [extractor/niconico:series] Fix extractor by [sqrtNOT](https://github.com/sqrtNOT)
* [extractor/npr] Use stream url from json-ld by [r5d](https://github.com/r5d)
* [extractor/pornhub] Extract `uploader_id` field by [Lesmiscore](https://github.com/Lesmiscore)
* [extractor/radiofrance] Add more radios by [bubbleguuum](https://github.com/bubbleguuum)
* [extractor/rumble] Detect JS embed
* [extractor/rumble] Extract subtitles by [fstirlitz](https://github.com/fstirlitz)
* [extractor/southpark] Add `southpark.lat` extractor by [darkxex](https://github.com/darkxex)
* [extractor/spotify:show] Fix extractor
* [extractor/tiktok] Detect embeds
* [extractor/tiktok] Extract `SIGI_STATE` by [dirkf](https://github.com/dirkf), [pukkandan](https://github.com/pukkandan), [sulyi](https://github.com/sulyi)
* [extractor/tver] Fix extractor by [Lesmiscore](https://github.com/Lesmiscore)
* [extractor/vevo] Fix extractor by [Lesmiscore](https://github.com/Lesmiscore)
* [extractor/yahoo:gyao] Fix extractor
* [extractor/zattoo] Fix live streams by [miseran](https://github.com/miseran)
* [extractor/zdf] Improve format sorting by [elyse0](https://github.com/elyse0)
### 2022.05.18
* Add support for SSL client certificate authentication by [coletdjnz](https://github.com/coletdjnz), [dirkf](https://github.com/dirkf)
@ -1156,7 +1281,7 @@
* [build] Automate more of the release process by [animelover1984](https://github.com/animelover1984), [pukkandan](https://github.com/pukkandan)
* [build] Fix sha256 by [nihil-admirari](https://github.com/nihil-admirari)
* [build] Bring back brew taps by [nao20010128nao](https://github.com/nao20010128nao)
* [build] Provide `--onedir` zip for windows by [pukkandan](https://github.com/pukkandan)
* [build] Provide `--onedir` zip for windows
* [cleanup,docs] Add deprecation warning in docs for some counter intuitive behaviour
* [cleanup] Fix line endings for `nebula.py` by [glenn-slayden](https://github.com/glenn-slayden)
* [cleanup] Improve `make clean-test` by [sulyi](https://github.com/sulyi)

View File

@ -9,7 +9,8 @@ tar: yt-dlp.tar.gz
# Keep this list in sync with MANIFEST.in
# intended use: when building a source distribution,
# make pypi-files && python setup.py sdist
pypi-files: AUTHORS Changelog.md LICENSE README.md README.txt supportedsites completions yt-dlp.1 devscripts/* test/*
pypi-files: AUTHORS Changelog.md LICENSE README.md README.txt supportedsites \
completions yt-dlp.1 requirements.txt setup.cfg devscripts/* test/*
.PHONY: all clean install test tar pypi-files completions ot offlinetest codetest supportedsites
@ -42,7 +43,7 @@ PYTHON ?= /usr/bin/env python3
SYSCONFDIR = $(shell if [ $(PREFIX) = /usr -o $(PREFIX) = /usr/local ]; then echo /etc; else echo $(PREFIX)/etc; fi)
# set markdown input format to "markdown-smart" for pandoc version 2 and to "markdown" for pandoc prior to version 2
MARKDOWN = $(shell if [ "$(pandoc -v | head -n1 | cut -d" " -f2 | head -c1)" = "2" ]; then echo markdown-smart; else echo markdown; fi)
MARKDOWN = $(shell if [ `pandoc -v | head -n1 | cut -d" " -f2 | head -c1` = "2" ]; then echo markdown-smart; else echo markdown; fi)
install: lazy-extractors yt-dlp yt-dlp.1 completions
mkdir -p $(DESTDIR)$(BINDIR)
@ -91,10 +92,10 @@ yt-dlp: yt_dlp/*.py yt_dlp/*/*.py
rm yt-dlp.zip
chmod a+x yt-dlp
README.md: yt_dlp/*.py yt_dlp/*/*.py
COLUMNS=80 $(PYTHON) yt_dlp/__main__.py --help | $(PYTHON) devscripts/make_readme.py
README.md: yt_dlp/*.py yt_dlp/*/*.py devscripts/make_readme.py
COLUMNS=80 $(PYTHON) yt_dlp/__main__.py --ignore-config --help | $(PYTHON) devscripts/make_readme.py
CONTRIBUTING.md: README.md
CONTRIBUTING.md: README.md devscripts/make_contributing.py
$(PYTHON) devscripts/make_contributing.py README.md CONTRIBUTING.md
issuetemplates: devscripts/make_issue_template.py .github/ISSUE_TEMPLATE_tmpl/1_broken_site.yml .github/ISSUE_TEMPLATE_tmpl/2_site_support_request.yml .github/ISSUE_TEMPLATE_tmpl/3_site_feature_request.yml .github/ISSUE_TEMPLATE_tmpl/4_bug_report.yml .github/ISSUE_TEMPLATE_tmpl/5_feature_request.yml yt_dlp/version.py
@ -111,7 +112,7 @@ supportedsites:
README.txt: README.md
pandoc -f $(MARKDOWN) -t plain README.md -o README.txt
yt-dlp.1: README.md
yt-dlp.1: README.md devscripts/prepare_manpage.py
$(PYTHON) devscripts/prepare_manpage.py yt-dlp.1.temp.md
pandoc -s -f $(MARKDOWN) -t man yt-dlp.1.temp.md -o yt-dlp.1
rm -f yt-dlp.1.temp.md
@ -128,7 +129,7 @@ completions/fish/yt-dlp.fish: yt_dlp/*.py yt_dlp/*/*.py devscripts/fish-completi
mkdir -p completions/fish
$(PYTHON) devscripts/fish-completion.py
_EXTRACTOR_FILES = $(shell find yt_dlp/extractor -iname '*.py' -and -not -iname 'lazy_extractors.py')
_EXTRACTOR_FILES = $(shell find yt_dlp/extractor -name '*.py' -and -not -name 'lazy_extractors.py')
yt_dlp/extractor/lazy_extractors.py: devscripts/make_lazy_extractors.py devscripts/lazy_load_template.py $(_EXTRACTOR_FILES)
$(PYTHON) devscripts/make_lazy_extractors.py $@
@ -147,7 +148,7 @@ yt-dlp.tar.gz: all
CONTRIBUTING.md Collaborators.md CONTRIBUTORS AUTHORS \
Makefile MANIFEST.in yt-dlp.1 README.txt completions \
setup.py setup.cfg yt-dlp yt_dlp requirements.txt \
devscripts test tox.ini pytest.ini
devscripts test
AUTHORS: .mailmap
git shortlog -s -n | cut -f2 | sort > AUTHORS

664
README.md

File diff suppressed because it is too large Load Diff

View File

@ -1,5 +1,4 @@
#!/usr/bin/env python3
import io
import optparse

View File

@ -53,7 +53,7 @@ def get_all_ies():
if os.path.exists(PLUGINS_DIRNAME):
os.rename(PLUGINS_DIRNAME, BLOCKED_DIRNAME)
try:
from yt_dlp.extractor import _ALL_CLASSES
from yt_dlp.extractor.extractors import _ALL_CLASSES
finally:
if os.path.exists(BLOCKED_DIRNAME):
os.rename(BLOCKED_DIRNAME, PLUGINS_DIRNAME)

View File

@ -2,6 +2,7 @@
# yt-dlp --help | make_readme.py
# This must be run in a console of correct width
import functools
import re
import sys
@ -10,21 +11,60 @@ README_FILE = 'README.md'
OPTIONS_START = 'General Options:'
OPTIONS_END = 'CONFIGURATION'
EPILOG_START = 'See full documentation'
ALLOWED_OVERSHOOT = 2
DISABLE_PATCH = object()
helptext = sys.stdin.read()
if isinstance(helptext, bytes):
helptext = helptext.decode()
def take_section(text, start=None, end=None, *, shift=0):
return text[
text.index(start) + shift if start else None:
text.index(end) + shift if end else None
]
start, end = helptext.index(f'\n {OPTIONS_START}'), helptext.index(f'\n{EPILOG_START}')
options = re.sub(r'(?m)^ (\w.+)$', r'## \1', helptext[start + 1: end + 1])
def apply_patch(text, patch):
return text if patch[0] is DISABLE_PATCH else re.sub(*patch, text)
options = take_section(sys.stdin.read(), f'\n {OPTIONS_START}', f'\n{EPILOG_START}', shift=1)
max_width = max(map(len, options.split('\n')))
switch_col_width = len(re.search(r'(?m)^\s{5,}', options).group())
delim = f'\n{" " * switch_col_width}'
PATCHES = (
( # Headings
r'(?m)^ (\w.+\n)( (?=\w))?',
r'## \1'
),
( # Do not split URLs
rf'({delim[:-1]})? (?P<label>\[\S+\] )?(?P<url>https?({delim})?:({delim})?/({delim})?/(({delim})?\S+)+)\s',
lambda mobj: ''.join((delim, mobj.group('label') or '', re.sub(r'\s+', '', mobj.group('url')), '\n'))
),
( # Do not split "words"
rf'(?m)({delim}\S+)+$',
lambda mobj: ''.join((delim, mobj.group(0).replace(delim, '')))
),
( # Allow overshooting last line
rf'(?m)^(?P<prev>.+)${delim}(?P<current>.+)$(?!{delim})',
lambda mobj: (mobj.group().replace(delim, ' ')
if len(mobj.group()) - len(delim) + 1 <= max_width + ALLOWED_OVERSHOOT
else mobj.group())
),
( # Avoid newline when a space is available b/w switch and description
DISABLE_PATCH, # This creates issues with prepare_manpage
r'(?m)^(\s{4}-.{%d})(%s)' % (switch_col_width - 6, delim),
r'\1 '
),
)
with open(README_FILE, encoding='utf-8') as f:
readme = f.read()
header = readme[:readme.index(f'## {OPTIONS_START}')]
footer = readme[readme.index(f'# {OPTIONS_END}'):]
with open(README_FILE, 'w', encoding='utf-8') as f:
for part in (header, options, footer):
f.write(part)
f.write(''.join((
take_section(readme, end=f'## {OPTIONS_START}'),
functools.reduce(apply_patch, PATCHES, options),
take_section(readme, f'# {OPTIONS_END}'),
)))

View File

@ -1,4 +1,4 @@
#!/bin/sh
#!/usr/bin/env sh
if [ -z $1 ]; then
test_set='test'

View File

@ -5,24 +5,7 @@ import sys
from PyInstaller.__main__ import run as run_pyinstaller
OS_NAME = platform.system()
if OS_NAME == 'Windows':
from PyInstaller.utils.win32.versioninfo import (
FixedFileInfo,
SetVersion,
StringFileInfo,
StringStruct,
StringTable,
VarFileInfo,
VarStruct,
VSVersionInfo,
)
elif OS_NAME == 'Darwin':
pass
else:
raise Exception(f'{OS_NAME} is not supported')
ARCH = platform.architecture()[0][:2]
OS_NAME, ARCH = sys.platform, platform.architecture()[0][:2]
def main():
@ -33,10 +16,7 @@ def main():
if not onedir and '-F' not in opts and '--onefile' not in opts:
opts.append('--onefile')
name = 'yt-dlp%s' % ('_macos' if OS_NAME == 'Darwin' else '_x86' if ARCH == '32' else '')
final_file = ''.join((
'dist/', f'{name}/' if onedir else '', name, '.exe' if OS_NAME == 'Windows' else ''))
name, final_file = exe(onedir)
print(f'Building yt-dlp v{version} {ARCH}bit for {OS_NAME} with options {opts}')
print('Remember to update the version using "devscripts/update-version.py"')
if not os.path.isfile('yt_dlp/extractor/lazy_extractors.py'):
@ -79,6 +59,21 @@ def read_version(fname):
return locals()['__version__']
def exe(onedir):
"""@returns (name, path)"""
name = '_'.join(filter(None, (
'yt-dlp',
{'win32': '', 'darwin': 'macos'}.get(OS_NAME, OS_NAME),
ARCH == '32' and 'x86'
)))
return name, ''.join(filter(None, (
'dist/',
onedir and f'{name}/',
name,
OS_NAME == 'win32' and '.exe'
)))
def version_to_list(version):
version_list = version.split('.')
return list(map(int, version_list)) + [0] * (4 - len(version_list))
@ -109,11 +104,22 @@ def pycryptodome_module():
def set_version_info(exe, version):
if OS_NAME == 'Windows':
if OS_NAME == 'win32':
windows_set_version(exe, version)
def windows_set_version(exe, version):
from PyInstaller.utils.win32.versioninfo import (
FixedFileInfo,
SetVersion,
StringFileInfo,
StringStruct,
StringTable,
VarFileInfo,
VarStruct,
VSVersionInfo,
)
version_list = version_to_list(version)
suffix = '_x86' if ARCH == '32' else ''
SetVersion(exe, VSVersionInfo(

View File

@ -1,4 +0,0 @@
[pytest]
addopts = -ra -v --strict-markers
markers =
download

View File

@ -1,6 +1,39 @@
[wheel]
universal = True
universal = true
[flake8]
exclude = devscripts/lazy_load_template.py,devscripts/make_issue_template.py,setup.py,build,.git,venv
exclude = build,venv,.tox,.git,.pytest_cache
ignore = E402,E501,E731,E741,W503
max_line_length = 120
per_file_ignores =
devscripts/lazy_load_template.py: F401
[tool:pytest]
addopts = -ra -v --strict-markers
markers =
download
[tox:tox]
skipsdist = true
envlist = py{36,37,38,39,310},pypy{36,37,38,39}
skip_missing_interpreters = true
[testenv] # tox
deps =
pytest
commands = pytest {posargs:"-m not download"}
passenv = HOME # For test_compat_expanduser
setenv =
# PYTHONWARNINGS = error # Catches PIP's warnings too
[isort]
py_version = 36
multi_line_output = VERTICAL_HANGING_INDENT
line_length = 80
reverse_relative = true
ensure_newline_before_comments = true
include_trailing_comma = true

View File

@ -36,7 +36,7 @@ REQUIREMENTS = read('requirements.txt').splitlines()
if sys.argv[1:2] == ['py2exe']:
import py2exe
import py2exe # noqa: F401
warnings.warn(
'py2exe builds do not support pycryptodomex and needs VC++14 to run. '
'The recommended way is to use "pyinst.py" to build using pyinstaller')
@ -140,6 +140,9 @@ setup(
'Programming Language :: Python :: 3.6',
'Programming Language :: Python :: 3.7',
'Programming Language :: Python :: 3.8',
'Programming Language :: Python :: 3.9',
'Programming Language :: Python :: 3.10',
'Programming Language :: Python :: 3.11',
'Programming Language :: Python :: Implementation',
'Programming Language :: Python :: Implementation :: CPython',
'Programming Language :: Python :: Implementation :: PyPy',

View File

@ -1,4 +1,6 @@
# Supported sites
- **0000studio:archive**
- **0000studio:clip**
- **17live**
- **17live:clip**
- **1tv**: Первый канал
@ -60,8 +62,6 @@
- **AmHistoryChannel**
- **anderetijden**: npo.nl, ntr.nl, omroepwnl.nl, zapp.nl and npo3.nl
- **AnimalPlanet**
- **AnimeLab**: [<abbr title="netrc machine"><em>animelab</em></abbr>]
- **AnimeLabShows**: [<abbr title="netrc machine"><em>animelab</em></abbr>]
- **AnimeOnDemand**: [<abbr title="netrc machine"><em>animeondemand</em></abbr>]
- **ant1newsgr:article**: ant1news.gr articles
- **ant1newsgr:embed**: ant1news.gr embedded videos
@ -89,6 +89,7 @@
- **AsianCrush**
- **AsianCrushPlaylist**
- **AtresPlayer**: [<abbr title="netrc machine"><em>atresplayer</em></abbr>]
- **AtScaleConfEvent**
- **ATTTechChannel**
- **ATVAt**
- **AudiMedia**
@ -276,6 +277,8 @@
- **dailymotion**: [<abbr title="netrc machine"><em>dailymotion</em></abbr>]
- **dailymotion:playlist**: [<abbr title="netrc machine"><em>dailymotion</em></abbr>]
- **dailymotion:user**: [<abbr title="netrc machine"><em>dailymotion</em></abbr>]
- **DailyWire**
- **DailyWirePodcast**
- **damtomo:record**
- **damtomo:video**
- **daum.net**
@ -322,8 +325,8 @@
- **drtv**
- **drtv:live**
- **DTube**
- **duboku**: www.duboku.co
- **duboku:list**: www.duboku.co entire series
- **duboku**: www.duboku.io
- **duboku:list**: www.duboku.io entire series
- **Dumpert**
- **dvtv**: http://video.aktualne.cz/
- **dw**
@ -403,6 +406,8 @@
- **FranceTVSite**
- **Freesound**
- **freespeech.org**
- **freetv:series**
- **FreeTvMovies**
- **FrontendMasters**: [<abbr title="netrc machine"><em>frontendmasters</em></abbr>]
- **FrontendMastersCourse**: [<abbr title="netrc machine"><em>frontendmasters</em></abbr>]
- **FrontendMastersLesson**: [<abbr title="netrc machine"><em>frontendmasters</em></abbr>]
@ -452,6 +457,7 @@
- **google:podcasts**
- **google:podcasts:feed**
- **GoogleDrive**
- **GoogleDrive:Folder**
- **GoPro**
- **Goshgay**
- **GoToStage**
@ -535,6 +541,7 @@
- **Iwara**
- **iwara:playlist**
- **iwara:user**
- **Ixigua**
- **Izlesene**
- **Jable**
- **JablePlaylist**
@ -554,12 +561,14 @@
- **Ketnet**
- **khanacademy**
- **khanacademy:unit**
- **Kicker**
- **KickStarter**
- **KinjaEmbed**
- **KinoPoisk**
- **KonserthusetPlay**
- **Koo**
- **KrasView**: Красвью
- **KTH**
- **Ku6**
- **KUSI**
- **kuwo:album**: 酷我音乐 - 专辑
@ -675,6 +684,7 @@
- **miomio.tv**
- **mirrativ**
- **mirrativ:user**
- **MirrorCoUK**
- **MiTele**: mitele.es
- **mixch**
- **mixch:archive**
@ -740,6 +750,7 @@
- **NationalGeographicTV**
- **Naver**
- **Naver:live**
- **navernow**
- **NBA**
- **nba:watch**
- **nba:watch:collection**
@ -769,6 +780,8 @@
- **netease:singer**: 网易云音乐 - 歌手
- **netease:song**: 网易云音乐
- **NetPlus**: [<abbr title="netrc machine"><em>netplus</em></abbr>]
- **Netverse**
- **NetversePlaylist**
- **Netzkino**
- **Newgrounds**
- **Newgrounds:playlist**
@ -932,6 +945,7 @@
- **PlayPlusTV**: [<abbr title="netrc machine"><em>playplustv</em></abbr>]
- **PlayStuff**
- **PlaysTV**
- **PlaySuisse**
- **Playtvak**: Playtvak.cz, iDNES.cz and Lidovky.cz
- **Playvid**
- **PlayVids**
@ -942,7 +956,6 @@
- **Podchaser**
- **podomatic**
- **Pokemon**
- **PokemonSoundLibrary**
- **PokemonWatch**
- **PokerGo**: [<abbr title="netrc machine"><em>pokergo</em></abbr>]
- **PokerGoCollection**: [<abbr title="netrc machine"><em>pokergo</em></abbr>]
@ -1150,6 +1163,7 @@
- **southpark.cc.com**
- **southpark.cc.com:español**
- **southpark.de**
- **southpark.lat**
- **southpark.nl**
- **southparkstudios.dk**
- **SovietsCloset**
@ -1189,6 +1203,7 @@
- **StretchInternet**
- **Stripchat**
- **stv:player**
- **Substack**
- **SunPorno**
- **sverigesradio:episode**
- **sverigesradio:publication**
@ -1463,6 +1478,7 @@
- **washingtonpost:article**
- **wat.tv**
- **WatchBox**
- **WatchESPN**
- **WatchIndianPorn**: Watch Indian Porn
- **WDR**
- **wdr:mobile**: (**Currently broken**)
@ -1535,6 +1551,7 @@
- **YourPorn**
- **YourUpload**
- **youtube**: YouTube
- **youtube:clip**
- **youtube:favorites**: YouTube liked videos; ":ytfav" keyword (requires cookies)
- **youtube:history**: Youtube watch history; ":ythis" keyword (requires cookies)
- **youtube:music:search_url**: YouTube music search URLs with selectable sections (Eg: #songs)

View File

@ -44,7 +44,7 @@ def try_rm(filename):
raise
def report_warning(message):
def report_warning(message, *args, **kwargs):
'''
Print the message to stderr, it will be prefixed with 'WARNING:'
If stderr is a tty file the 'WARNING:' will be colored
@ -67,10 +67,10 @@ class FakeYDL(YoutubeDL):
super().__init__(params, auto_init=False)
self.result = []
def to_screen(self, s, skip_eol=None):
def to_screen(self, s, *args, **kwargs):
print(s)
def trouble(self, s, tb=None):
def trouble(self, s, *args, **kwargs):
raise Exception(s)
def download(self, x):
@ -80,10 +80,10 @@ class FakeYDL(YoutubeDL):
# Silence an expected warning matching a regex
old_report_warning = self.report_warning
def report_warning(self, message):
def report_warning(self, message, *args, **kwargs):
if re.match(regex, message):
return
old_report_warning(message)
old_report_warning(message, *args, **kwargs)
self.report_warning = types.MethodType(report_warning, self)
@ -301,9 +301,9 @@ def assertEqual(self, got, expected, msg=None):
def expect_warnings(ydl, warnings_re):
real_warning = ydl.report_warning
def _report_warning(w):
def _report_warning(w, *args, **kwargs):
if not any(re.search(w_re, w) for w_re in warnings_re):
real_warning(w)
real_warning(w, *args, **kwargs)
ydl.report_warning = _report_warning

View File

@ -502,6 +502,24 @@ class TestInfoExtractor(unittest.TestCase):
}],
})
# from https://0000.studio/
# with type attribute but without extension in URL
expect_dict(
self,
self.ie._parse_html5_media_entries(
'https://0000.studio',
r'''
<video src="https://d1ggyt9m8pwf3g.cloudfront.net/protected/ap-northeast-1:1864af40-28d5-492b-b739-b32314b1a527/archive/clip/838db6a7-8973-4cd6-840d-8517e4093c92"
controls="controls" type="video/mp4" preload="metadata" autoplay="autoplay" playsinline class="object-contain">
</video>
''', None)[0],
{
'formats': [{
'url': 'https://d1ggyt9m8pwf3g.cloudfront.net/protected/ap-northeast-1:1864af40-28d5-492b-b739-b32314b1a527/archive/clip/838db6a7-8973-4cd6-840d-8517e4093c92',
'ext': 'mp4',
}],
})
def test_extract_jwplayer_data_realworld(self):
# from http://www.suffolk.edu/sjc/
expect_dict(

View File

@ -23,6 +23,7 @@ from yt_dlp.postprocessor.common import PostProcessor
from yt_dlp.utils import (
ExtractorError,
LazyList,
OnDemandPagedList,
int_or_none,
match_filter_func,
)
@ -39,7 +40,7 @@ class YDL(FakeYDL):
def process_info(self, info_dict):
self.downloaded_info_dicts.append(info_dict.copy())
def to_screen(self, msg):
def to_screen(self, msg, *args, **kwargs):
self.msgs.append(msg)
def dl(self, *args, **kwargs):
@ -989,41 +990,79 @@ class TestYoutubeDL(unittest.TestCase):
self.assertEqual(res, [])
def test_playlist_items_selection(self):
entries = [{
'id': compat_str(i),
'title': compat_str(i),
INDICES, PAGE_SIZE = list(range(1, 11)), 3
def entry(i, evaluated):
evaluated.append(i)
return {
'id': str(i),
'title': str(i),
'url': TEST_URL,
} for i in range(1, 5)]
playlist = {
}
def pagedlist_entries(evaluated):
def page_func(n):
start = PAGE_SIZE * n
for i in INDICES[start: start + PAGE_SIZE]:
yield entry(i, evaluated)
return OnDemandPagedList(page_func, PAGE_SIZE)
def page_num(i):
return (i + PAGE_SIZE - 1) // PAGE_SIZE
def generator_entries(evaluated):
for i in INDICES:
yield entry(i, evaluated)
def list_entries(evaluated):
return list(generator_entries(evaluated))
def lazylist_entries(evaluated):
return LazyList(generator_entries(evaluated))
def get_downloaded_info_dicts(params, entries):
ydl = YDL(params)
ydl.process_ie_result({
'_type': 'playlist',
'id': 'test',
'entries': entries,
'extractor': 'test:playlist',
'extractor_key': 'test:playlist',
'webpage_url': 'http://example.com',
}
def get_downloaded_info_dicts(params):
ydl = YDL(params)
# make a deep copy because the dictionary and nested entries
# can be modified
ydl.process_ie_result(copy.deepcopy(playlist))
'entries': entries,
})
return ydl.downloaded_info_dicts
def test_selection(params, expected_ids):
results = [
(v['playlist_autonumber'] - 1, (int(v['id']), v['playlist_index']))
for v in get_downloaded_info_dicts(params)]
self.assertEqual(results, list(enumerate(zip(expected_ids, expected_ids))))
def test_selection(params, expected_ids, evaluate_all=False):
expected_ids = list(expected_ids)
if evaluate_all:
generator_eval = pagedlist_eval = INDICES
elif not expected_ids:
generator_eval = pagedlist_eval = []
else:
generator_eval = INDICES[0: max(expected_ids)]
pagedlist_eval = INDICES[PAGE_SIZE * page_num(min(expected_ids)) - PAGE_SIZE:
PAGE_SIZE * page_num(max(expected_ids))]
test_selection({}, [1, 2, 3, 4])
test_selection({'playlistend': 10}, [1, 2, 3, 4])
test_selection({'playlistend': 2}, [1, 2])
test_selection({'playliststart': 10}, [])
test_selection({'playliststart': 2}, [2, 3, 4])
test_selection({'playlist_items': '2-4'}, [2, 3, 4])
for name, func, expected_eval in (
('list', list_entries, INDICES),
('Generator', generator_entries, generator_eval),
# ('LazyList', lazylist_entries, generator_eval), # Generator and LazyList follow the exact same code path
('PagedList', pagedlist_entries, pagedlist_eval),
):
evaluated = []
entries = func(evaluated)
results = [(v['playlist_autonumber'] - 1, (int(v['id']), v['playlist_index']))
for v in get_downloaded_info_dicts(params, entries)]
self.assertEqual(results, list(enumerate(zip(expected_ids, expected_ids))), f'Entries of {name} for {params}')
self.assertEqual(sorted(evaluated), expected_eval, f'Evaluation of {name} for {params}')
test_selection({}, INDICES)
test_selection({'playlistend': 20}, INDICES, True)
test_selection({'playlistend': 2}, INDICES[:2])
test_selection({'playliststart': 11}, [], True)
test_selection({'playliststart': 2}, INDICES[1:])
test_selection({'playlist_items': '2-4'}, INDICES[1:4])
test_selection({'playlist_items': '2,4'}, [2, 4])
test_selection({'playlist_items': '10'}, [])
test_selection({'playlist_items': '20'}, [], True)
test_selection({'playlist_items': '0'}, [])
# Tests for https://github.com/ytdl-org/youtube-dl/issues/10591
@ -1032,11 +1071,33 @@ class TestYoutubeDL(unittest.TestCase):
# Tests for https://github.com/yt-dlp/yt-dlp/issues/720
# https://github.com/yt-dlp/yt-dlp/issues/302
test_selection({'playlistreverse': True}, [4, 3, 2, 1])
test_selection({'playliststart': 2, 'playlistreverse': True}, [4, 3, 2])
test_selection({'playlistreverse': True}, INDICES[::-1])
test_selection({'playliststart': 2, 'playlistreverse': True}, INDICES[:0:-1])
test_selection({'playlist_items': '2,4', 'playlistreverse': True}, [4, 2])
test_selection({'playlist_items': '4,2'}, [4, 2])
# Tests for --playlist-items start:end:step
test_selection({'playlist_items': ':'}, INDICES, True)
test_selection({'playlist_items': '::1'}, INDICES, True)
test_selection({'playlist_items': '::-1'}, INDICES[::-1], True)
test_selection({'playlist_items': ':6'}, INDICES[:6])
test_selection({'playlist_items': ':-6'}, INDICES[:-5], True)
test_selection({'playlist_items': '-1:6:-2'}, INDICES[:4:-2], True)
test_selection({'playlist_items': '9:-6:-2'}, INDICES[8:3:-2], True)
test_selection({'playlist_items': '1:inf:2'}, INDICES[::2], True)
test_selection({'playlist_items': '-2:inf'}, INDICES[-2:], True)
test_selection({'playlist_items': ':inf:-1'}, [], True)
test_selection({'playlist_items': '0-2:2'}, [2])
test_selection({'playlist_items': '1-:2'}, INDICES[::2], True)
test_selection({'playlist_items': '0--2:2'}, INDICES[1:-1:2], True)
test_selection({'playlist_items': '10::3'}, [10], True)
test_selection({'playlist_items': '-1::3'}, [10], True)
test_selection({'playlist_items': '11::3'}, [], True)
test_selection({'playlist_items': '-15::2'}, INDICES[1::2], True)
test_selection({'playlist_items': '-15::15'}, [], True)
def test_urlopen_no_file_protocol(self):
# see https://github.com/ytdl-org/youtube-dl/issues/8227
ydl = YDL()

View File

@ -14,16 +14,16 @@ from yt_dlp.cookies import (
class Logger:
def debug(self, message):
def debug(self, message, *args, **kwargs):
print(f'[verbose] {message}')
def info(self, message):
def info(self, message, *args, **kwargs):
print(message)
def warning(self, message, only_once=False):
def warning(self, message, *args, **kwargs):
self.error(message)
def error(self, message):
def error(self, message, *args, **kwargs):
raise Exception(message)

View File

@ -43,7 +43,7 @@ class YoutubeDL(yt_dlp.YoutubeDL):
self.processed_info_dicts = []
super().__init__(*args, **kwargs)
def report_warning(self, message):
def report_warning(self, message, *args, **kwargs):
# Don't accept warnings during tests
raise ExtractorError(message)
@ -102,9 +102,10 @@ def generator(test_case, tname):
def print_skipping(reason):
print('Skipping %s: %s' % (test_case['name'], reason))
self.skipTest(reason)
if not ie.working():
print_skipping('IE marked as not _WORKING')
return
for tc in test_cases:
info_dict = tc.get('info_dict', {})
@ -118,11 +119,10 @@ def generator(test_case, tname):
if 'skip' in test_case:
print_skipping(test_case['skip'])
return
for other_ie in other_ies:
if not other_ie.working():
print_skipping('test depends on %sIE, marked as not WORKING' % other_ie.ie_key())
return
params = get_params(test_case.get('params', {}))
params['outtmpl'] = tname + '_' + params['outtmpl']

View File

@ -38,6 +38,9 @@ class BaseTestSubtitles(unittest.TestCase):
self.DL = FakeYDL()
self.ie = self.IE()
self.DL.add_info_extractor(self.ie)
if not self.IE.working():
print('Skipping: %s marked as not _WORKING' % self.IE.ie_key())
self.skipTest('IE marked as not _WORKING')
def getInfoDict(self):
info_dict = self.DL.extract_info(self.url, download=False)
@ -57,6 +60,21 @@ class BaseTestSubtitles(unittest.TestCase):
@is_download_test
class TestYoutubeSubtitles(BaseTestSubtitles):
# Available subtitles for QRS8MkLhQmM:
# Language formats
# ru vtt, ttml, srv3, srv2, srv1, json3
# fr vtt, ttml, srv3, srv2, srv1, json3
# en vtt, ttml, srv3, srv2, srv1, json3
# nl vtt, ttml, srv3, srv2, srv1, json3
# de vtt, ttml, srv3, srv2, srv1, json3
# ko vtt, ttml, srv3, srv2, srv1, json3
# it vtt, ttml, srv3, srv2, srv1, json3
# zh-Hant vtt, ttml, srv3, srv2, srv1, json3
# hi vtt, ttml, srv3, srv2, srv1, json3
# pt-BR vtt, ttml, srv3, srv2, srv1, json3
# es-MX vtt, ttml, srv3, srv2, srv1, json3
# ja vtt, ttml, srv3, srv2, srv1, json3
# pl vtt, ttml, srv3, srv2, srv1, json3
url = 'QRS8MkLhQmM'
IE = YoutubeIE
@ -65,47 +83,60 @@ class TestYoutubeSubtitles(BaseTestSubtitles):
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(len(subtitles.keys()), 13)
self.assertEqual(md5(subtitles['en']), '688dd1ce0981683867e7fe6fde2a224b')
self.assertEqual(md5(subtitles['it']), '31324d30b8430b309f7f5979a504a769')
self.assertEqual(md5(subtitles['en']), 'ae1bd34126571a77aabd4d276b28044d')
self.assertEqual(md5(subtitles['it']), '0e0b667ba68411d88fd1c5f4f4eab2f9')
for lang in ['fr', 'de']:
self.assertTrue(subtitles.get(lang) is not None, 'Subtitles for \'%s\' not extracted' % lang)
def test_youtube_subtitles_ttml_format(self):
def _test_subtitles_format(self, fmt, md5_hash, lang='en'):
self.DL.params['writesubtitles'] = True
self.DL.params['subtitlesformat'] = 'ttml'
self.DL.params['subtitlesformat'] = fmt
subtitles = self.getSubtitles()
self.assertEqual(md5(subtitles['en']), 'c97ddf1217390906fa9fbd34901f3da2')
self.assertEqual(md5(subtitles[lang]), md5_hash)
def test_youtube_subtitles_ttml_format(self):
self._test_subtitles_format('ttml', 'c97ddf1217390906fa9fbd34901f3da2')
def test_youtube_subtitles_vtt_format(self):
self.DL.params['writesubtitles'] = True
self.DL.params['subtitlesformat'] = 'vtt'
self._test_subtitles_format('vtt', 'ae1bd34126571a77aabd4d276b28044d')
def test_youtube_subtitles_json3_format(self):
self._test_subtitles_format('json3', '688dd1ce0981683867e7fe6fde2a224b')
def _test_automatic_captions(self, url, lang):
self.url = url
self.DL.params['writeautomaticsub'] = True
self.DL.params['subtitleslangs'] = [lang]
subtitles = self.getSubtitles()
self.assertEqual(md5(subtitles['en']), 'ae1bd34126571a77aabd4d276b28044d')
self.assertTrue(subtitles[lang] is not None)
def test_youtube_automatic_captions(self):
self.url = '8YoUxe5ncPo'
self.DL.params['writeautomaticsub'] = True
self.DL.params['subtitleslangs'] = ['it']
subtitles = self.getSubtitles()
self.assertTrue(subtitles['it'] is not None)
def test_youtube_no_automatic_captions(self):
self.url = 'QRS8MkLhQmM'
self.DL.params['writeautomaticsub'] = True
subtitles = self.getSubtitles()
self.assertTrue(not subtitles)
# Available automatic captions for 8YoUxe5ncPo:
# Language formats (all in vtt, ttml, srv3, srv2, srv1, json3)
# gu, zh-Hans, zh-Hant, gd, ga, gl, lb, la, lo, tt, tr,
# lv, lt, tk, th, tg, te, fil, haw, yi, ceb, yo, de, da,
# el, eo, en, eu, et, es, ru, rw, ro, bn, be, bg, uk, jv,
# bs, ja, or, xh, co, ca, cy, cs, ps, pt, pa, vi, pl, hy,
# hr, ht, hu, hmn, hi, ha, mg, uz, ml, mn, mi, mk, ur,
# mt, ms, mr, ug, ta, my, af, sw, is, am,
# *it*, iw, sv, ar,
# su, zu, az, id, ig, nl, no, ne, ny, fr, ku, fy, fa, fi,
# ka, kk, sr, sq, ko, kn, km, st, sk, si, so, sn, sm, sl,
# ky, sd
# ...
self._test_automatic_captions('8YoUxe5ncPo', 'it')
@unittest.skip('Video unavailable')
def test_youtube_translated_subtitles(self):
# This video has a subtitles track, which can be translated
self.url = 'i0ZabxXmH4Y'
self.DL.params['writeautomaticsub'] = True
self.DL.params['subtitleslangs'] = ['it']
subtitles = self.getSubtitles()
self.assertTrue(subtitles['it'] is not None)
# This video has a subtitles track, which can be translated (#4555)
self._test_automatic_captions('Ky9eprVWzlI', 'it')
def test_youtube_nosubtitles(self):
self.DL.expect_warning('video doesn\'t have subtitles')
self.url = 'n5BB19UTcdA'
# Available automatic captions for 8YoUxe5ncPo:
# ...
# 8YoUxe5ncPo has no subtitles
self.url = '8YoUxe5ncPo'
self.DL.params['writesubtitles'] = True
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
@ -137,6 +168,7 @@ class TestDailymotionSubtitles(BaseTestSubtitles):
@is_download_test
@unittest.skip('IE broken')
class TestTedSubtitles(BaseTestSubtitles):
url = 'http://www.ted.com/talks/dan_dennett_on_our_consciousness.html'
IE = TedTalkIE
@ -162,12 +194,12 @@ class TestVimeoSubtitles(BaseTestSubtitles):
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(set(subtitles.keys()), {'de', 'en', 'es', 'fr'})
self.assertEqual(md5(subtitles['en']), '8062383cf4dec168fc40a088aa6d5888')
self.assertEqual(md5(subtitles['fr']), 'b6191146a6c5d3a452244d853fde6dc8')
self.assertEqual(md5(subtitles['en']), '386cbc9320b94e25cb364b97935e5dd1')
self.assertEqual(md5(subtitles['fr']), 'c9b69eef35bc6641c0d4da8a04f9dfac')
def test_nosubtitles(self):
self.DL.expect_warning('video doesn\'t have subtitles')
self.url = 'http://vimeo.com/56015672'
self.url = 'http://vimeo.com/68093876'
self.DL.params['writesubtitles'] = True
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
@ -175,6 +207,7 @@ class TestVimeoSubtitles(BaseTestSubtitles):
@is_download_test
@unittest.skip('IE broken')
class TestWallaSubtitles(BaseTestSubtitles):
url = 'http://vod.walla.co.il/movie/2705958/the-yes-men'
IE = WallaIE
@ -197,6 +230,7 @@ class TestWallaSubtitles(BaseTestSubtitles):
@is_download_test
@unittest.skip('IE broken')
class TestCeskaTelevizeSubtitles(BaseTestSubtitles):
url = 'http://www.ceskatelevize.cz/ivysilani/10600540290-u6-uzasny-svet-techniky'
IE = CeskaTelevizeIE
@ -219,6 +253,7 @@ class TestCeskaTelevizeSubtitles(BaseTestSubtitles):
@is_download_test
@unittest.skip('IE broken')
class TestLyndaSubtitles(BaseTestSubtitles):
url = 'http://www.lynda.com/Bootstrap-tutorials/Using-exercise-files/110885/114408-4.html'
IE = LyndaIE
@ -232,6 +267,7 @@ class TestLyndaSubtitles(BaseTestSubtitles):
@is_download_test
@unittest.skip('IE broken')
class TestNPOSubtitles(BaseTestSubtitles):
url = 'http://www.npo.nl/nos-journaal/28-08-2014/POW_00722860'
IE = NPOIE
@ -245,6 +281,7 @@ class TestNPOSubtitles(BaseTestSubtitles):
@is_download_test
@unittest.skip('IE broken')
class TestMTVSubtitles(BaseTestSubtitles):
url = 'http://www.cc.com/video-clips/p63lk0/adam-devine-s-house-party-chasing-white-swans'
IE = ComedyCentralIE
@ -269,8 +306,8 @@ class TestNRKSubtitles(BaseTestSubtitles):
self.DL.params['writesubtitles'] = True
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(set(subtitles.keys()), {'no'})
self.assertEqual(md5(subtitles['no']), '544fa917d3197fcbee64634559221cc2')
self.assertEqual(set(subtitles.keys()), {'nb-ttv'})
self.assertEqual(md5(subtitles['nb-ttv']), '67e06ff02d0deaf975e68f6cb8f6a149')
@is_download_test
@ -295,6 +332,7 @@ class TestRaiPlaySubtitles(BaseTestSubtitles):
@is_download_test
@unittest.skip('IE broken - DRM only')
class TestVikiSubtitles(BaseTestSubtitles):
url = 'http://www.viki.com/videos/1060846v-punch-episode-18'
IE = VikiIE
@ -323,6 +361,7 @@ class TestThePlatformSubtitles(BaseTestSubtitles):
@is_download_test
@unittest.skip('IE broken')
class TestThePlatformFeedSubtitles(BaseTestSubtitles):
url = 'http://feed.theplatform.com/f/7wvmTC/msnbc_video-p-test?form=json&pretty=true&range=-40&byGuid=n_hardball_5biden_140207'
IE = ThePlatformFeedIE
@ -360,7 +399,7 @@ class TestDemocracynowSubtitles(BaseTestSubtitles):
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(set(subtitles.keys()), {'en'})
self.assertEqual(md5(subtitles['en']), 'acaca989e24a9e45a6719c9b3d60815c')
self.assertEqual(md5(subtitles['en']), 'a3cc4c0b5eadd74d9974f1c1f5101045')
def test_subtitles_in_page(self):
self.url = 'http://www.democracynow.org/2015/7/3/this_flag_comes_down_today_bree'
@ -368,7 +407,7 @@ class TestDemocracynowSubtitles(BaseTestSubtitles):
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(set(subtitles.keys()), {'en'})
self.assertEqual(md5(subtitles['en']), 'acaca989e24a9e45a6719c9b3d60815c')
self.assertEqual(md5(subtitles['en']), 'a3cc4c0b5eadd74d9974f1c1f5101045')
@is_download_test

16
tox.ini
View File

@ -1,16 +0,0 @@
[tox]
envlist = py26,py27,py33,py34,py35
# Needed?
[testenv]
deps =
nose
coverage
# We need a valid $HOME for test_compat_expanduser
passenv = HOME
defaultargs = test --exclude test_download.py --exclude test_age_restriction.py
--exclude test_subtitles.py --exclude test_write_annotations.py
--exclude test_youtube_lists.py --exclude test_iqiyi_sdk_interpreter.py
--exclude test_socks.py
commands = nosetests --verbose {posargs:{[testenv]defaultargs}} # --with-coverage --cover-package=yt_dlp --cover-html
# test.test_download:TestDownload.test_NowVideo

View File

@ -1,2 +1,2 @@
#!/bin/sh
#!/usr/bin/env sh
exec "${PYTHON:-python3}" -bb -Werror -Xdev "$(dirname "$(realpath "$0")")/yt_dlp/__main__.py" "$@"

View File

@ -27,19 +27,17 @@ from string import ascii_letters
from .cache import Cache
from .compat import (
HAS_LEGACY as compat_has_legacy,
compat_get_terminal_size,
compat_os_name,
compat_shlex_quote,
compat_str,
compat_urllib_error,
compat_urllib_request,
windows_enable_vt_mode,
)
from .cookies import load_cookies
from .downloader import FFmpegFD, get_suitable_downloader, shorten_protocol_name
from .downloader.rtmp import rtmpdump_version
from .extractor import _LAZY_LOADER
from .extractor import _PLUGIN_CLASSES as plugin_extractors
from .extractor import gen_extractor_classes, get_info_extractor
from .extractor.openload import PhantomJSwrapper
from .minicurses import format_text
@ -60,6 +58,7 @@ from .postprocessor import (
from .update import detect_variant
from .utils import (
DEFAULT_OUTTMPL,
IDENTITY,
LINK_TEMPLATES,
NO_DEFAULT,
NUMBER_RE,
@ -76,13 +75,13 @@ from .utils import (
ExtractorError,
GeoRestrictedError,
HEADRequest,
InAdvancePagedList,
ISO3166Utils,
LazyList,
MaxDownloadsReached,
Namespace,
PagedList,
PerRequestProxyHandler,
PlaylistEntries,
Popen,
PostProcessingError,
ReExtractInfo,
@ -142,6 +141,7 @@ from .utils import (
url_basename,
variadic,
version_tuple,
windows_enable_vt_mode,
write_json_file,
write_string,
)
@ -194,13 +194,6 @@ class YoutubeDL:
For compatibility, a single list is also accepted
print_to_file: A dict with keys WHEN (same as forceprint) mapped to
a list of tuples with (template, filename)
forceurl: Force printing final URL. (Deprecated)
forcetitle: Force printing title. (Deprecated)
forceid: Force printing ID. (Deprecated)
forcethumbnail: Force printing thumbnail URL. (Deprecated)
forcedescription: Force printing description. (Deprecated)
forcefilename: Force printing final filename. (Deprecated)
forceduration: Force printing duration. (Deprecated)
forcejson: Force printing info_dict as JSON.
dump_single_json: Force printing the info_dict of the whole playlist
(or video) as a single JSON line.
@ -250,11 +243,9 @@ class YoutubeDL:
and don't overwrite any file if False
For compatibility with youtube-dl,
"nooverwrites" may also be used instead
playliststart: Playlist item to start at.
playlistend: Playlist item to end at.
playlist_items: Specific indices of playlist to download.
playlistreverse: Download playlist items in reverse order.
playlistrandom: Download playlist items in random order.
lazy_playlist: Process playlist entries as they are received.
matchtitle: Download only matching titles.
rejecttitle: Reject downloads for matching titles.
logger: Log messages to a logging.Logger instance.
@ -277,9 +268,6 @@ class YoutubeDL:
writedesktoplink: Write a Linux internet shortcut file (.desktop)
writesubtitles: Write the video subtitles to a file
writeautomaticsub: Write the automatically generated subtitles to a file
allsubtitles: Deprecated - Use subtitleslangs = ['all']
Downloads all the subtitles of the video
(requires writesubtitles or writeautomaticsub)
listsubtitles: Lists all available subtitles for the video
subtitlesformat: The format code for subtitles
subtitleslangs: List of languages of the subtitles to download (can be regex).
@ -333,7 +321,6 @@ class YoutubeDL:
bidi_workaround: Work around buggy terminals without bidirectional text
support, using fridibi
debug_printtraffic:Print out sent and received HTTP traffic
include_ads: Download ads as well (deprecated)
default_search: Prepend this string if an input url is not valid.
'auto' for elaborate guessing
encoding: Use this encoding instead of the system-specified.
@ -349,10 +336,6 @@ class YoutubeDL:
* when: When to run the postprocessor. Allowed values are
the entries of utils.POSTPROCESS_WHEN
Assumed to be 'post_process' if not given
post_hooks: Deprecated - Register a custom postprocessor instead
A list of functions that get called as the final step
for each video file, after all postprocessors have been
called. The filename will be passed as the only argument.
progress_hooks: A list of functions that get called on download
progress, with a dictionary with the entries
* status: One of "downloading", "error", or "finished".
@ -397,8 +380,6 @@ class YoutubeDL:
- "detect_or_warn": check whether we can do anything
about it, warn otherwise (default)
source_address: Client-side IP address to bind to.
call_home: Boolean, true iff we are allowed to contact the
yt-dlp servers for debugging. (BROKEN)
sleep_interval_requests: Number of seconds to sleep between requests
during extraction
sleep_interval: Number of seconds to sleep before each download when
@ -433,17 +414,10 @@ class YoutubeDL:
geo_bypass_ip_block:
IP range in CIDR notation that will be used similarly to
geo_bypass_country
The following options determine which downloader is picked:
external_downloader: A dictionary of protocol keys and the executable of the
external downloader to use for it. The allowed protocols
are default|http|ftp|m3u8|dash|rtsp|rtmp|mms.
Set the value to 'native' to use the native downloader
hls_prefer_native: Deprecated - Use external_downloader = {'m3u8': 'native'}
or {'m3u8': 'ffmpeg'} instead.
Use the native HLS downloader instead of ffmpeg/avconv
if True, otherwise use ffmpeg/avconv if False, otherwise
use downloader suggested by extractor if None.
compat_opts: Compatibility options. See "Differences in default behavior".
The following options do not work when used through the API:
filename, abort-on-error, multistreams, no-live-chat, format-sort
@ -453,6 +427,16 @@ class YoutubeDL:
Allowed keys are 'download', 'postprocess',
'download-title' (console title) and 'postprocess-title'.
The template is mapped on a dictionary with keys 'progress' and 'info'
retry_sleep_functions: Dictionary of functions that takes the number of attempts
as argument and returns the time to sleep in seconds.
Allowed keys are 'http', 'fragment', 'file_access'
download_ranges: A function that gets called for every video with the signature
(info_dict, *, ydl) -> Iterable[Section].
Only the returned sections will be downloaded. Each Section contains:
* start_time: Start time of the section in seconds
* end_time: End time of the section in seconds
* title: Section title (Optional)
* index: Section number (Optional)
The following parameters are not used by YoutubeDL itself, they are used by
the downloader (see yt_dlp/downloader/common.py):
@ -462,8 +446,6 @@ class YoutubeDL:
external_downloader_args, concurrent_fragment_downloads.
The following options are used by the post processors:
prefer_ffmpeg: If False, use avconv instead of ffmpeg if both are available,
otherwise prefer ffmpeg. (avconv support is deprecated)
ffmpeg_location: Location of the ffmpeg/avconv binary; either the path
to the binary or its containing directory.
postprocessor_args: A dictionary of postprocessor/executable keys (in lower case)
@ -483,12 +465,54 @@ class YoutubeDL:
See "EXTRACTOR ARGUMENTS" for details.
Eg: {'youtube': {'skip': ['dash', 'hls']}}
mark_watched: Mark videos watched (even with --simulate). Only for YouTube
youtube_include_dash_manifest: Deprecated - Use extractor_args instead.
The following options are deprecated and may be removed in the future:
playliststart: - Use playlist_items
Playlist item to start at.
playlistend: - Use playlist_items
Playlist item to end at.
playlistreverse: - Use playlist_items
Download playlist items in reverse order.
forceurl: - Use forceprint
Force printing final URL.
forcetitle: - Use forceprint
Force printing title.
forceid: - Use forceprint
Force printing ID.
forcethumbnail: - Use forceprint
Force printing thumbnail URL.
forcedescription: - Use forceprint
Force printing description.
forcefilename: - Use forceprint
Force printing final filename.
forceduration: - Use forceprint
Force printing duration.
allsubtitles: - Use subtitleslangs = ['all']
Downloads all the subtitles of the video
(requires writesubtitles or writeautomaticsub)
include_ads: - Doesn't work
Download ads as well
call_home: - Not implemented
Boolean, true iff we are allowed to contact the
yt-dlp servers for debugging.
post_hooks: - Register a custom postprocessor
A list of functions that get called as the final step
for each video file, after all postprocessors have been
called. The filename will be passed as the only argument.
hls_prefer_native: - Use external_downloader = {'m3u8': 'native'} or {'m3u8': 'ffmpeg'}.
Use the native HLS downloader instead of ffmpeg/avconv
if True, otherwise use ffmpeg/avconv if False, otherwise
use downloader suggested by extractor if None.
prefer_ffmpeg: - avconv support is deprecated
If False, use avconv instead of ffmpeg if both are available,
otherwise prefer ffmpeg.
youtube_include_dash_manifest: - Use extractor_args
If True (default), DASH manifests and related
data will be downloaded and processed by extractor.
You can reduce network I/O by disabling it if you don't
care about DASH. (only for youtube)
youtube_include_hls_manifest: Deprecated - Use extractor_args instead.
youtube_include_hls_manifest: - Use extractor_args
If True (default), HLS manifests and related
data will be downloaded and processed by extractor.
You can reduce network I/O by disabling it if you don't
@ -555,12 +579,17 @@ class YoutubeDL:
)
self._allow_colors = Namespace(**{
type_: not self.params.get('no_color') and supports_terminal_sequences(stream)
for type_, stream in self._out_files if type_ != 'console'
for type_, stream in self._out_files.items_ if type_ != 'console'
})
if sys.version_info < (3, 6):
self.report_warning(
'Python version %d.%d is not supported! Please update to Python 3.6 or above' % sys.version_info[:2])
MIN_SUPPORTED, MIN_RECOMMENDED = (3, 6), (3, 7)
current_version = sys.version_info[:2]
if current_version < MIN_RECOMMENDED:
msg = 'Support for Python version %d.%d has been deprecated and will break in future versions of yt-dlp'
if current_version < MIN_SUPPORTED:
msg = 'Python version %d.%d is no longer supported'
self.deprecation_warning(
f'{msg}! Please update to Python %d.%d or above' % (*current_version, *MIN_RECOMMENDED))
if self.params.get('allow_unplayable_formats'):
self.report_warning(
@ -588,7 +617,10 @@ class YoutubeDL:
for msg in self.params.get('_deprecation_warnings', []):
self.deprecation_warning(msg)
if 'list-formats' in self.params.get('compat_opts', []):
self.params['compat_opts'] = set(self.params.get('compat_opts', ()))
if not compat_has_legacy:
self.params['compat_opts'].add('no-compat-legacy')
if 'list-formats' in self.params['compat_opts']:
self.params['listformats_table'] = False
if 'overwrites' not in self.params and self.params.get('nooverwrites') is not None:
@ -643,7 +675,7 @@ class YoutubeDL:
'Set the LC_ALL environment variable to fix this.')
self.params['restrictfilenames'] = True
self.outtmpl_dict = self.parse_outtmpl()
self._parse_outtmpl()
# Creating format selector here allows us to catch syntax errors before the extraction
self.format_selector = (
@ -743,6 +775,7 @@ class YoutubeDL:
def add_post_processor(self, pp, when='post_process'):
"""Add a PostProcessor object to the end of the chain."""
assert when in POSTPROCESS_WHEN, f'Invalid when={when}'
self._pps[when].append(pp)
pp.set_downloader(self)
@ -785,9 +818,9 @@ class YoutubeDL:
"""Print message to stdout"""
if quiet is not None:
self.deprecation_warning('"YoutubeDL.to_stdout" no longer accepts the argument quiet. Use "YoutubeDL.to_screen" instead')
self._write_string(
'%s%s' % (self._bidi_workaround(message), ('' if skip_eol else '\n')),
self._out_files.out)
if skip_eol is not False:
self.deprecation_warning('"YoutubeDL.to_stdout" no longer accepts the argument skip_eol. Use "YoutubeDL.to_screen" instead')
self._write_string(f'{self._bidi_workaround(message)}\n', self._out_files.out)
def to_screen(self, message, skip_eol=False, quiet=None):
"""Print message to screen if not in quiet mode"""
@ -939,7 +972,7 @@ class YoutubeDL:
'''Log debug message or Print message to stderr'''
if not self.params.get('verbose', False):
return
message = '[debug] %s' % message
message = f'[debug] {message}'
if self.params.get('logger'):
self.params['logger'].debug(message)
else:
@ -970,21 +1003,19 @@ class YoutubeDL:
self.report_warning(msg)
def parse_outtmpl(self):
outtmpl_dict = self.params.get('outtmpl', {})
if not isinstance(outtmpl_dict, dict):
outtmpl_dict = {'default': outtmpl_dict}
# Remove spaces in the default template
if self.params.get('restrictfilenames'):
self.deprecation_warning('"YoutubeDL.parse_outtmpl" is deprecated and may be removed in a future version')
self._parse_outtmpl()
return self.params['outtmpl']
def _parse_outtmpl(self):
sanitize = IDENTITY
if self.params.get('restrictfilenames'): # Remove spaces in the default template
sanitize = lambda x: x.replace(' - ', ' ').replace(' ', '-')
else:
sanitize = lambda x: x
outtmpl_dict.update({
k: sanitize(v) for k, v in DEFAULT_OUTTMPL.items()
if outtmpl_dict.get(k) is None})
for _, val in outtmpl_dict.items():
if isinstance(val, bytes):
self.report_warning('Parameter outtmpl is bytes, but should be a unicode string')
return outtmpl_dict
outtmpl = self.params.setdefault('outtmpl', {})
if not isinstance(outtmpl, dict):
self.params['outtmpl'] = outtmpl = {'default': outtmpl}
outtmpl.update({k: sanitize(v) for k, v in DEFAULT_OUTTMPL.items() if outtmpl.get(k) is None})
def get_output_path(self, dir_type='', filename=None):
paths = self.params.get('paths', {})
@ -1035,6 +1066,7 @@ class YoutubeDL:
def _copy_infodict(info_dict):
info_dict = dict(info_dict)
info_dict.pop('__postprocessors', None)
info_dict.pop('__pending_error', None)
return info_dict
def prepare_outtmpl(self, outtmpl, info_dict, sanitize=False):
@ -1132,7 +1164,7 @@ class YoutubeDL:
def filename_sanitizer(key, value, restricted=self.params.get('restrictfilenames')):
return sanitize_filename(str(value), restricted=restricted, is_id=(
bool(re.search(r'(^|[_.])id(\.|$)', key))
if 'filename-sanitization' in self.params.get('compat_opts', [])
if 'filename-sanitization' in self.params['compat_opts']
else NO_DEFAULT))
sanitizer = sanitize if callable(sanitize) else filename_sanitizer
@ -1221,7 +1253,7 @@ class YoutubeDL:
def _prepare_filename(self, info_dict, *, outtmpl=None, tmpl_type=None):
assert None in (outtmpl, tmpl_type), 'outtmpl and tmpl_type are mutually exclusive'
if outtmpl is None:
outtmpl = self.outtmpl_dict.get(tmpl_type or 'default', self.outtmpl_dict['default'])
outtmpl = self.params['outtmpl'].get(tmpl_type or 'default', self.params['outtmpl']['default'])
try:
outtmpl = self._outtmpl_expandpath(outtmpl)
filename = self.evaluate_outtmpl(outtmpl, info_dict, True)
@ -1387,7 +1419,7 @@ class YoutubeDL:
else:
self.report_error('no suitable InfoExtractor for URL %s' % url)
def __handle_extraction_exceptions(func):
def _handle_extraction_exceptions(func):
@functools.wraps(func)
def wrapper(self, *args, **kwargs):
while True:
@ -1460,7 +1492,7 @@ class YoutubeDL:
self.to_screen('')
raise
@__handle_extraction_exceptions
@_handle_extraction_exceptions
def __extract_info(self, url, ie, download, extra_info, process):
ie_result = ie.extract(url)
if ie_result is None: # Finished already (backwards compatibility; listformats and friends should be moved here)
@ -1526,6 +1558,7 @@ class YoutubeDL:
self.add_extra_info(info_copy, extra_info)
info_copy, _ = self.pre_process(info_copy)
self.__forced_printings(info_copy, self.prepare_filename(info_copy), incomplete=True)
self._raise_pending_errors(info_copy)
if self.params.get('force_write_download_archive', False):
self.record_download_archive(info_copy)
return ie_result
@ -1533,6 +1566,7 @@ class YoutubeDL:
if result_type == 'video':
self.add_extra_info(ie_result, extra_info)
ie_result = self.process_video_result(ie_result, download=download)
self._raise_pending_errors(ie_result)
additional_urls = (ie_result or {}).get('additional_urls')
if additional_urls:
# TODO: Improve MetadataParserPP to allow setting a list
@ -1567,9 +1601,13 @@ class YoutubeDL:
if not info:
return info
exempted_fields = {'_type', 'url', 'ie_key'}
if not ie_result.get('section_end') and ie_result.get('section_start') is None:
# For video clips, the id etc of the clip extractor should be used
exempted_fields |= {'id', 'extractor', 'extractor_key'}
new_result = info.copy()
new_result.update(filter_dict(ie_result, lambda k, v: (
v is not None and k not in {'_type', 'url', 'id', 'extractor', 'extractor_key', 'ie_key'})))
new_result.update(filter_dict(ie_result, lambda k, v: v is not None and k not in exempted_fields))
# Extracted info may not be a video result (i.e.
# info.get('_type', 'video') != video) but rather an url or
@ -1641,112 +1679,31 @@ class YoutubeDL:
}
def __process_playlist(self, ie_result, download):
# We process each entry in the playlist
playlist = ie_result.get('title') or ie_result.get('id')
self.to_screen('[download] Downloading playlist: %s' % playlist)
"""Process each entry in the playlist"""
title = ie_result.get('title') or ie_result.get('id') or '<Untitled>'
self.to_screen(f'[download] Downloading playlist: {title}')
if 'entries' not in ie_result:
raise EntryNotInPlaylist('There are no entries')
all_entries = PlaylistEntries(self, ie_result)
entries = orderedSet(all_entries.get_requested_items(), lazy=True)
MissingEntry = object()
incomplete_entries = bool(ie_result.get('requested_entries'))
if incomplete_entries:
def fill_missing_entries(entries, indices):
ret = [MissingEntry] * max(indices)
for i, entry in zip(indices, entries):
ret[i - 1] = entry
return ret
ie_result['entries'] = fill_missing_entries(ie_result['entries'], ie_result['requested_entries'])
playlist_results = []
playliststart = self.params.get('playliststart', 1)
playlistend = self.params.get('playlistend')
# For backwards compatibility, interpret -1 as whole list
if playlistend == -1:
playlistend = None
playlistitems_str = self.params.get('playlist_items')
playlistitems = None
if playlistitems_str is not None:
def iter_playlistitems(format):
for string_segment in format.split(','):
if '-' in string_segment:
start, end = string_segment.split('-')
for item in range(int(start), int(end) + 1):
yield int(item)
lazy = self.params.get('lazy_playlist')
if lazy:
resolved_entries, n_entries = [], 'N/A'
ie_result['requested_entries'], ie_result['entries'] = None, None
else:
yield int(string_segment)
playlistitems = orderedSet(iter_playlistitems(playlistitems_str))
ie_entries = ie_result['entries']
if isinstance(ie_entries, list):
playlist_count = len(ie_entries)
msg = f'Collected {playlist_count} videos; downloading %d of them'
ie_result['playlist_count'] = ie_result.get('playlist_count') or playlist_count
def get_entry(i):
return ie_entries[i - 1]
else:
msg = 'Downloading %d videos'
if not isinstance(ie_entries, (PagedList, LazyList)):
ie_entries = LazyList(ie_entries)
elif isinstance(ie_entries, InAdvancePagedList):
if ie_entries._pagesize == 1:
playlist_count = ie_entries._pagecount
def get_entry(i):
return YoutubeDL.__handle_extraction_exceptions(
lambda self, i: ie_entries[i - 1]
)(self, i)
entries, broken = [], False
items = playlistitems if playlistitems is not None else itertools.count(playliststart)
for i in items:
if i == 0:
continue
if playlistitems is None and playlistend is not None and playlistend < i:
break
entry = None
try:
entry = get_entry(i)
if entry is MissingEntry:
raise EntryNotInPlaylist()
except (IndexError, EntryNotInPlaylist):
if incomplete_entries:
raise EntryNotInPlaylist(f'Entry {i} cannot be found')
elif not playlistitems:
break
entries.append(entry)
try:
if entry is not None:
# TODO: Add auto-generated fields
self._match_entry(entry, incomplete=True, silent=True)
except (ExistingVideoReached, RejectedVideoReached):
broken = True
break
ie_result['entries'] = entries
# Save playlist_index before re-ordering
entries = [
((playlistitems[i - 1] if playlistitems else i + playliststart - 1), entry)
for i, entry in enumerate(entries, 1)
if entry is not None]
n_entries = len(entries)
if not (ie_result.get('playlist_count') or broken or playlistitems or playlistend):
ie_result['playlist_count'] = n_entries
if not playlistitems and (playliststart != 1 or playlistend):
playlistitems = list(range(playliststart, playliststart + n_entries))
ie_result['requested_entries'] = playlistitems
entries = resolved_entries = list(entries)
n_entries = len(resolved_entries)
ie_result['requested_entries'], ie_result['entries'] = tuple(zip(*resolved_entries)) or ([], [])
if not ie_result.get('playlist_count'):
# Better to do this after potentially exhausting entries
ie_result['playlist_count'] = all_entries.get_full_count()
_infojson_written = False
write_playlist_files = self.params.get('allow_playlist_files', True)
if write_playlist_files and self.params.get('list_thumbnails'):
self.list_thumbnails(ie_result)
if write_playlist_files and not self.params.get('simulate'):
ie_copy = self._playlist_infodict(ie_result, n_entries=n_entries)
ie_copy = self._playlist_infodict(ie_result, n_entries=int_or_none(n_entries))
_infojson_written = self._write_info_json(
'playlist', ie_result, self.prepare_filename(ie_copy, 'pl_infojson'))
if _infojson_written is None:
@ -1757,33 +1714,41 @@ class YoutubeDL:
# TODO: This should be passed to ThumbnailsConvertor if necessary
self._write_thumbnails('playlist', ie_copy, self.prepare_filename(ie_copy, 'pl_thumbnail'))
if self.params.get('playlistreverse', False):
entries = entries[::-1]
if self.params.get('playlistrandom', False):
if lazy:
if self.params.get('playlistreverse') or self.params.get('playlistrandom'):
self.report_warning('playlistreverse and playlistrandom are not supported with lazy_playlist', only_once=True)
elif self.params.get('playlistreverse'):
entries.reverse()
elif self.params.get('playlistrandom'):
random.shuffle(entries)
x_forwarded_for = ie_result.get('__x_forwarded_for_ip')
self.to_screen(f'[{ie_result["extractor"]}] Playlist {title}: Downloading {n_entries} videos'
f'{format_field(ie_result, "playlist_count", " of %s")}')
self.to_screen(f'[{ie_result["extractor"]}] playlist {playlist}: {msg % n_entries}')
failures = 0
max_failures = self.params.get('skip_playlist_after_errors') or float('inf')
for i, entry_tuple in enumerate(entries, 1):
playlist_index, entry = entry_tuple
if 'playlist-index' in self.params.get('compat_opts', []):
playlist_index = playlistitems[i - 1] if playlistitems else i + playliststart - 1
for i, (playlist_index, entry) in enumerate(entries):
if lazy:
resolved_entries.append((playlist_index, entry))
# TODO: Add auto-generated fields
if self._match_entry(entry, incomplete=True) is not None:
continue
self.to_screen('[download] Downloading video %s of %s' % (
self._format_screen(i, self.Styles.ID), self._format_screen(n_entries, self.Styles.EMPHASIS)))
# This __x_forwarded_for_ip thing is a bit ugly but requires
# minimal changes
if x_forwarded_for:
entry['__x_forwarded_for_ip'] = x_forwarded_for
extra = {
'n_entries': n_entries,
'__last_playlist_index': max(playlistitems) if playlistitems else (playlistend or n_entries),
self._format_screen(i + 1, self.Styles.ID), self._format_screen(n_entries, self.Styles.EMPHASIS)))
entry['__x_forwarded_for_ip'] = ie_result.get('__x_forwarded_for_ip')
if not lazy and 'playlist-index' in self.params.get('compat_opts', []):
playlist_index = ie_result['requested_entries'][i]
entry_result = self.__process_iterable_entry(entry, download, {
'n_entries': int_or_none(n_entries),
'__last_playlist_index': max(ie_result['requested_entries'] or (0, 0)),
'playlist_count': ie_result.get('playlist_count'),
'playlist_index': playlist_index,
'playlist_autonumber': i,
'playlist': playlist,
'playlist_autonumber': i + 1,
'playlist': title,
'playlist_id': ie_result.get('id'),
'playlist_title': ie_result.get('title'),
'playlist_uploader': ie_result.get('uploader'),
@ -1793,20 +1758,17 @@ class YoutubeDL:
'webpage_url_basename': url_basename(ie_result['webpage_url']),
'webpage_url_domain': get_domain(ie_result['webpage_url']),
'extractor_key': ie_result['extractor_key'],
}
if self._match_entry(entry, incomplete=True) is not None:
continue
entry_result = self.__process_iterable_entry(entry, download, extra)
})
if not entry_result:
failures += 1
if failures >= max_failures:
self.report_error(
'Skipping the remaining entries in playlist "%s" since %d items failed extraction' % (playlist, failures))
f'Skipping the remaining entries in playlist "{title}" since {failures} items failed extraction')
break
playlist_results.append(entry_result)
ie_result['entries'] = playlist_results
resolved_entries[i] = (playlist_index, entry_result)
# Update with processed data
ie_result['requested_entries'], ie_result['entries'] = tuple(zip(*resolved_entries)) or ([], [])
# Write the updated info to json
if _infojson_written is True and self._write_info_json(
@ -1815,10 +1777,10 @@ class YoutubeDL:
return
ie_result = self.run_all_pps('playlist', ie_result)
self.to_screen(f'[download] Finished downloading playlist: {playlist}')
self.to_screen(f'[download] Finished downloading playlist: {title}')
return ie_result
@__handle_extraction_exceptions
@_handle_extraction_exceptions
def __process_iterable_entry(self, entry, download, extra_info):
return self.process_ie_result(
entry, download=download, extra_info=extra_info)
@ -1900,7 +1862,7 @@ class YoutubeDL:
temp_file.close()
try:
success, _ = self.dl(temp_file.name, f, test=True)
except (DownloadError, IOError, OSError, ValueError) + network_exceptions:
except (DownloadError, OSError, ValueError) + network_exceptions:
success = False
finally:
if os.path.exists(temp_file.name):
@ -1925,11 +1887,11 @@ class YoutubeDL:
and (
not can_merge()
or info_dict.get('is_live') and not self.params.get('live_from_start')
or self.outtmpl_dict['default'] == '-'))
or self.params['outtmpl']['default'] == '-'))
compat = (
prefer_best
or self.params.get('allow_multiple_audio_streams', False)
or 'format-spec' in self.params.get('compat_opts', []))
or 'format-spec' in self.params['compat_opts'])
return (
'best/bestvideo+bestaudio' if prefer_best
@ -2270,7 +2232,7 @@ class YoutubeDL:
def _calc_headers(self, info_dict):
res = merge_headers(self.params['http_headers'], info_dict.get('http_headers') or {})
cookies = self._calc_cookies(info_dict)
cookies = self._calc_cookies(info_dict['url'])
if cookies:
res['Cookie'] = cookies
@ -2281,8 +2243,8 @@ class YoutubeDL:
return res
def _calc_cookies(self, info_dict):
pr = sanitized_Request(info_dict['url'])
def _calc_cookies(self, url):
pr = sanitized_Request(url)
self.cookiejar.add_cookie_header(pr)
return pr.get_header('Cookie')
@ -2380,6 +2342,11 @@ class YoutubeDL:
if info_dict.get('%s_number' % field) is not None and not info_dict.get(field):
info_dict[field] = '%s %d' % (field.capitalize(), info_dict['%s_number' % field])
def _raise_pending_errors(self, info):
err = info.pop('__pending_error', None)
if err:
self.report_error(err, tb=False)
def process_video_result(self, info_dict, download=True):
assert info_dict.get('_type', 'video') == 'video'
self._num_videos += 1
@ -2411,6 +2378,8 @@ class YoutubeDL:
sanitize_string_field(info_dict, 'id')
sanitize_numeric_fields(info_dict)
if info_dict.get('section_end') and info_dict.get('section_start') is not None:
info_dict['duration'] = round(info_dict['section_end'] - info_dict['section_start'], 3)
if (info_dict.get('duration') or 0) <= 0 and info_dict.pop('duration', None):
self.report_warning('"duration" field is negative, there is an error in extractor')
@ -2538,7 +2507,7 @@ class YoutubeDL:
format['dynamic_range'] = 'SDR'
if (info_dict.get('duration') and format.get('tbr')
and not format.get('filesize') and not format.get('filesize_approx')):
format['filesize_approx'] = info_dict['duration'] * format['tbr'] * (1024 / 8)
format['filesize_approx'] = int(info_dict['duration'] * format['tbr'] * (1024 / 8))
# Add HTTP headers, so that external programs can use them from the
# json output
@ -2585,7 +2554,7 @@ class YoutubeDL:
if list_only:
# Without this printing, -F --print-json will not work
self.__forced_printings(info_dict, self.prepare_filename(info_dict), incomplete=True)
return
return info_dict
format_selector = self.format_selector
if format_selector is None:
@ -2626,20 +2595,40 @@ class YoutubeDL:
# Process what we can, even without any available formats.
formats_to_download = [{}]
best_format = formats_to_download[-1]
requested_ranges = self.params.get('download_ranges')
if requested_ranges:
requested_ranges = tuple(requested_ranges(info_dict, self))
best_format, downloaded_formats = formats_to_download[-1], []
if download:
if best_format:
self.to_screen(
f'[info] {info_dict["id"]}: Downloading {len(formats_to_download)} format(s): '
+ ', '.join([f['format_id'] for f in formats_to_download]))
def to_screen(*msg):
self.to_screen(f'[info] {info_dict["id"]}: {" ".join(", ".join(variadic(m)) for m in msg)}')
to_screen(f'Downloading {len(formats_to_download)} format(s):',
(f['format_id'] for f in formats_to_download))
if requested_ranges:
to_screen(f'Downloading {len(requested_ranges)} time ranges:',
(f'{int(c["start_time"])}-{int(c["end_time"])}' for c in requested_ranges))
max_downloads_reached = False
for i, fmt in enumerate(formats_to_download):
formats_to_download[i] = new_info = self._copy_infodict(info_dict)
for fmt, chapter in itertools.product(formats_to_download, requested_ranges or [{}]):
new_info = self._copy_infodict(info_dict)
new_info.update(fmt)
offset, duration = info_dict.get('section_start') or 0, info_dict.get('duration') or float('inf')
if chapter or offset:
new_info.update({
'section_start': offset + chapter.get('start_time', 0),
'section_end': offset + min(chapter.get('end_time', 0), duration),
'section_title': chapter.get('title'),
'section_number': chapter.get('index'),
})
downloaded_formats.append(new_info)
try:
self.process_info(new_info)
except MaxDownloadsReached:
max_downloads_reached = True
self._raise_pending_errors(new_info)
# Remove copied info
for key, val in tuple(new_info.items()):
if info_dict.get(key) == val:
@ -2647,12 +2636,12 @@ class YoutubeDL:
if max_downloads_reached:
break
write_archive = {f.get('__write_download_archive', False) for f in formats_to_download}
write_archive = {f.get('__write_download_archive', False) for f in downloaded_formats}
assert write_archive.issubset({True, False, 'ignore'})
if True in write_archive and False not in write_archive:
self.record_download_archive(info_dict)
info_dict['requested_downloads'] = formats_to_download
info_dict['requested_downloads'] = downloaded_formats
info_dict = self.run_all_pps('after_video', info_dict)
if max_downloads_reached:
raise MaxDownloadsReached()
@ -2874,8 +2863,13 @@ class YoutubeDL:
# Forced printings
self.__forced_printings(info_dict, full_filename, incomplete=('format' not in info_dict))
def check_max_downloads():
if self._num_downloads >= float(self.params.get('max_downloads') or 'inf'):
raise MaxDownloadsReached()
if self.params.get('simulate'):
info_dict['__write_download_archive'] = self.params.get('force_write_download_archive')
check_max_downloads()
return
if full_filename is None:
@ -2979,12 +2973,8 @@ class YoutubeDL:
info_dict.clear()
info_dict.update(new_info)
try:
new_info, files_to_move = self.pre_process(info_dict, 'before_dl', files_to_move)
replace_info_dict(new_info)
except PostProcessingError as err:
self.report_error('Preprocessing: %s' % str(err))
return
if self.params.get('skip_download'):
info_dict['filepath'] = temp_filename
@ -3006,7 +2996,16 @@ class YoutubeDL:
info_dict['ext'] = os.path.splitext(file)[1][1:]
return file
success = True
fd, success = None, True
if info_dict.get('protocol') or info_dict.get('url'):
fd = get_suitable_downloader(info_dict, self.params, to_stdout=temp_filename == '-')
if fd is not FFmpegFD and (
info_dict.get('section_start') or info_dict.get('section_end')):
msg = ('This format cannot be partially downloaded' if FFmpegFD.available()
else 'You have requested downloading the video partially, but ffmpeg is not installed')
self.report_error(f'{msg}. Aborting')
return
if info_dict.get('requested_formats') is not None:
def compatible_formats(formats):
@ -3039,7 +3038,7 @@ class YoutubeDL:
and info_dict.get('thumbnails')
# check with type instead of pp_key, __name__, or isinstance
# since we dont want any custom PPs to trigger this
and any(type(pp) == EmbedThumbnailPP for pp in self._pps['post_process'])):
and any(type(pp) == EmbedThumbnailPP for pp in self._pps['post_process'])): # noqa: E721
info_dict['ext'] = 'mkv'
self.report_warning(
'webm doesn\'t support embedding a thumbnail, mkv will be used')
@ -3061,10 +3060,8 @@ class YoutubeDL:
dl_filename = existing_video_file(full_filename, temp_filename)
info_dict['__real_download'] = False
downloaded = []
merger = FFmpegMergerPP(self)
fd = get_suitable_downloader(info_dict, self.params, to_stdout=temp_filename == '-')
downloaded = []
if dl_filename is not None:
self.report_file_already_downloaded(dl_filename)
elif fd:
@ -3144,6 +3141,7 @@ class YoutubeDL:
self.report_error(f'content too short (expected {err.expected} bytes and served {err.downloaded})')
return
self._raise_pending_errors(info_dict)
if success and full_filename != '-':
def fixup():
@ -3213,15 +3211,10 @@ class YoutubeDL:
return
info_dict['__write_download_archive'] = True
assert info_dict is original_infodict # Make sure the info_dict was modified in-place
if self.params.get('force_write_download_archive'):
info_dict['__write_download_archive'] = True
# Make sure the info_dict was modified in-place
assert info_dict is original_infodict
max_downloads = self.params.get('max_downloads')
if max_downloads is not None and self._num_downloads >= int(max_downloads):
raise MaxDownloadsReached()
check_max_downloads()
def __download_wrapper(self, func):
@functools.wraps(func)
@ -3243,7 +3236,7 @@ class YoutubeDL:
def download(self, url_list):
"""Download a given list of URLs."""
url_list = variadic(url_list) # Passing a single URL is a common mistake
outtmpl = self.outtmpl_dict['default']
outtmpl = self.params['outtmpl']['default']
if (len(url_list) > 1
and outtmpl != '-'
and '%' not in outtmpl
@ -3364,7 +3357,12 @@ class YoutubeDL:
def pre_process(self, ie_info, key='pre_process', files_to_move=None):
info = dict(ie_info)
info['__files_to_move'] = files_to_move or {}
try:
info = self.run_all_pps(key, info)
except PostProcessingError as err:
msg = f'Preprocessing: {err}'
info.setdefault('__pending_error', msg)
self.report_error(msg, is_error=False)
return info, info.pop('__files_to_move', None)
def post_process(self, filename, info, files_to_move=None):
@ -3599,10 +3597,14 @@ class YoutubeDL:
if not self.params.get('verbose'):
return
# These imports can be slow. So import them only as needed
from .extractor.extractors import _LAZY_LOADER
from .extractor.extractors import _PLUGIN_CLASSES as plugin_extractors
def get_encoding(stream):
ret = str(getattr(stream, 'encoding', 'missing (%s)' % type(stream).__name__))
if not supports_terminal_sequences(stream):
from .compat import WINDOWS_VT_MODE # Must be imported locally
from .utils import WINDOWS_VT_MODE # Must be imported locally
ret += ' (No VT)' if WINDOWS_VT_MODE is False else ' (No ANSI)'
return ret
@ -3611,7 +3613,7 @@ class YoutubeDL:
sys.getfilesystemencoding(),
self.get_encoding(),
', '.join(
f'{key} {get_encoding(stream)}' for key, stream in self._out_files
f'{key} {get_encoding(stream)}' for key, stream in self._out_files.items_
if stream is not None and key != 'console')
)
@ -3638,19 +3640,17 @@ class YoutubeDL:
write_debug('Plugins: %s' % [
'%s%s' % (klass.__name__, '' if klass.__name__ == name else f' as {name}')
for name, klass in itertools.chain(plugin_extractors.items(), plugin_postprocessors.items())])
if self.params.get('compat_opts'):
write_debug('Compatibility options: %s' % ', '.join(self.params.get('compat_opts')))
if self.params['compat_opts']:
write_debug('Compatibility options: %s' % ', '.join(self.params['compat_opts']))
if source == 'source':
try:
sp = Popen(
stdout, _, _ = Popen.run(
['git', 'rev-parse', '--short', 'HEAD'],
stdout=subprocess.PIPE, stderr=subprocess.PIPE,
cwd=os.path.dirname(os.path.abspath(__file__)))
out, err = sp.communicate_or_kill()
out = out.decode().strip()
if re.match('[0-9a-f]+', out):
write_debug('Git HEAD: %s' % out)
text=True, cwd=os.path.dirname(os.path.abspath(__file__)),
stdout=subprocess.PIPE, stderr=subprocess.PIPE)
if re.fullmatch('[0-9a-f]+', stdout.strip()):
write_debug(f'Git HEAD: {stdout.strip()}')
except Exception:
with contextlib.suppress(Exception):
sys.exc_clear()

View File

@ -4,14 +4,16 @@ f'You are using an unsupported version of Python. Only Python versions 3.6 and a
__license__ = 'Public Domain'
import itertools
import optparse
import os
import re
import sys
from .compat import compat_getpass, compat_os_name, compat_shlex_quote
from .compat import compat_getpass, compat_shlex_quote
from .cookies import SUPPORTED_BROWSERS, SUPPORTED_KEYRINGS
from .downloader import FileDownloader
from .extractor import GenericIE, list_extractor_classes
from .downloader.external import get_external_downloader
from .extractor import list_extractor_classes
from .extractor.adobepass import MSO_INFO
from .extractor.common import InfoExtractor
from .options import parseOpts
@ -24,7 +26,7 @@ from .postprocessor import (
MetadataFromFieldPP,
MetadataParserPP,
)
from .update import run_update
from .update import Updater
from .utils import (
NO_DEFAULT,
POSTPROCESS_WHEN,
@ -32,41 +34,47 @@ from .utils import (
DownloadCancelled,
DownloadError,
GeoUtils,
PlaylistEntries,
SameFileError,
decodeOption,
download_range_func,
expand_path,
float_or_none,
format_field,
int_or_none,
match_filter_func,
parse_duration,
preferredencoding,
read_batch_urls,
read_stdin,
render_table,
setproctitle,
std_headers,
traverse_obj,
variadic,
write_string,
)
from .YoutubeDL import YoutubeDL
def _exit(status=0, *args):
for msg in args:
sys.stderr.write(msg)
raise SystemExit(status)
def get_urls(urls, batchfile, verbose):
# Batch file verification
batch_urls = []
if batchfile is not None:
try:
if batchfile == '-':
write_string('Reading URLs from stdin - EOF (%s) to end:\n' % (
'Ctrl+Z' if compat_os_name == 'nt' else 'Ctrl+D'))
batchfd = sys.stdin
else:
batchfd = open(
expand_path(batchfile), encoding='utf-8', errors='ignore')
batch_urls = read_batch_urls(batchfd)
batch_urls = read_batch_urls(
read_stdin('URLs') if batchfile == '-'
else open(expand_path(batchfile), encoding='utf-8', errors='ignore'))
if verbose:
write_string('[debug] Batch file urls: ' + repr(batch_urls) + '\n')
except OSError:
sys.exit('ERROR: batch file %s could not be read' % batchfile)
_exit(f'ERROR: batch file {batchfile} could not be read')
_enc = preferredencoding()
return [
url.strip().decode(_enc, 'ignore') if isinstance(url, bytes) else url.strip()
@ -74,6 +82,10 @@ def get_urls(urls, batchfile, verbose):
def print_extractor_information(opts, urls):
# Importing GenericIE is currently slow since it imports other extractors
# TODO: Move this back to module level after generalization of embed detection
from .extractor.generic import GenericIE
out = ''
if opts.list_extractors:
urls = dict.fromkeys(urls, False)
@ -209,15 +221,11 @@ def validate_options(opts):
validate_regex('format sorting', f, InfoExtractor.FormatSort.regex)
# Postprocessor formats
validate_in('audio format', opts.audioformat, ['best'] + list(FFmpegExtractAudioPP.SUPPORTED_EXTS))
validate_regex('audio format', opts.audioformat, FFmpegExtractAudioPP.FORMAT_RE)
validate_in('subtitle format', opts.convertsubtitles, FFmpegSubtitlesConvertorPP.SUPPORTED_EXTS)
validate_in('thumbnail format', opts.convertthumbnails, FFmpegThumbnailsConvertorPP.SUPPORTED_EXTS)
if opts.recodevideo is not None:
opts.recodevideo = opts.recodevideo.replace(' ', '')
validate_regex('video recode format', opts.recodevideo, FFmpegVideoConvertorPP.FORMAT_RE)
if opts.remuxvideo is not None:
opts.remuxvideo = opts.remuxvideo.replace(' ', '')
validate_regex('video remux format', opts.remuxvideo, FFmpegVideoRemuxerPP.FORMAT_RE)
validate_regex('thumbnail format', opts.convertthumbnails, FFmpegThumbnailsConvertorPP.FORMAT_RE)
validate_regex('recode video format', opts.recodevideo, FFmpegVideoConvertorPP.FORMAT_RE)
validate_regex('remux video format', opts.remuxvideo, FFmpegVideoRemuxerPP.FORMAT_RE)
if opts.audioquality:
opts.audioquality = opts.audioquality.strip('k').strip('K')
# int_or_none prevents inf, nan
@ -239,6 +247,28 @@ def validate_options(opts):
opts.extractor_retries = parse_retries('extractor', opts.extractor_retries)
opts.file_access_retries = parse_retries('file access', opts.file_access_retries)
# Retry sleep function
def parse_sleep_func(expr):
NUMBER_RE = r'\d+(?:\.\d+)?'
op, start, limit, step, *_ = tuple(re.fullmatch(
rf'(?:(linear|exp)=)?({NUMBER_RE})(?::({NUMBER_RE})?)?(?::({NUMBER_RE}))?',
expr.strip()).groups()) + (None, None)
if op == 'exp':
return lambda n: min(float(start) * (float(step or 2) ** n), float(limit or 'inf'))
else:
default_step = start if op or limit else 0
return lambda n: min(float(start) + float(step or default_step) * n, float(limit or 'inf'))
for key, expr in opts.retry_sleep.items():
if not expr:
del opts.retry_sleep[key]
continue
try:
opts.retry_sleep[key] = parse_sleep_func(expr)
except AttributeError:
raise ValueError(f'invalid {key} retry sleep expression {expr!r}')
# Bytes
def parse_bytes(name, value):
if value is None:
@ -283,20 +313,25 @@ def validate_options(opts):
'Cannot download a video and extract audio into the same file! '
f'Use "{outtmpl_default}.%(ext)s" instead of "{outtmpl_default}" as the output template')
# Remove chapters
remove_chapters_patterns, opts.remove_ranges = [], []
for regex in opts.remove_chapters or []:
def parse_chapters(name, value):
chapters, ranges = [], []
for regex in value or []:
if regex.startswith('*'):
dur = list(map(parse_duration, regex[1:].split('-')))
for range in regex[1:].split(','):
dur = tuple(map(parse_duration, range.strip().split('-')))
if len(dur) == 2 and all(t is not None for t in dur):
opts.remove_ranges.append(tuple(dur))
ranges.append(dur)
else:
raise ValueError(f'invalid {name} time range "{regex}". Must be of the form *start-end')
continue
raise ValueError(f'invalid --remove-chapters time range "{regex}". Must be of the form *start-end')
try:
remove_chapters_patterns.append(re.compile(regex))
chapters.append(re.compile(regex))
except re.error as err:
raise ValueError(f'invalid --remove-chapters regex "{regex}" - {err}')
opts.remove_chapters = remove_chapters_patterns
raise ValueError(f'invalid {name} regex "{regex}" - {err}')
return chapters, ranges
opts.remove_chapters, opts.remove_ranges = parse_chapters('--remove-chapters', opts.remove_chapters)
opts.download_ranges = download_range_func(*parse_chapters('--download-sections', opts.download_ranges))
# Cookies from browser
if opts.cookiesfrombrowser:
@ -340,6 +375,12 @@ def validate_options(opts):
opts.parse_metadata = list(itertools.chain(*map(metadataparser_actions, parse_metadata)))
# Other options
if opts.playlist_items is not None:
try:
tuple(PlaylistEntries.parse_playlist_items(opts.playlist_items))
except Exception as err:
raise ValueError(f'Invalid playlist-items {opts.playlist_items!r}: {err}')
geo_bypass_code = opts.geo_bypass_ip_block or opts.geo_bypass_country
if geo_bypass_code is not None:
try:
@ -360,6 +401,15 @@ def validate_options(opts):
if opts.no_sponsorblock:
opts.sponsorblock_mark = opts.sponsorblock_remove = set()
default_downloader = None
for proto, path in opts.external_downloader.items():
ed = get_external_downloader(path)
if ed is None:
raise ValueError(
f'No such {format_field(proto, None, "%s ", ignore="default")}external downloader "{path}"')
elif ed and proto == 'default':
default_downloader = ed.get_basename()
warnings, deprecation_warnings = [], []
# Common mistake: -f best
@ -370,13 +420,18 @@ def validate_options(opts):
'If you know what you are doing and want only the best pre-merged format, use "-f b" instead to suppress this warning')))
# --(postprocessor/downloader)-args without name
def report_args_compat(name, value, key1, key2=None):
def report_args_compat(name, value, key1, key2=None, where=None):
if key1 in value and key2 not in value:
warnings.append(f'{name} arguments given without specifying name. The arguments will be given to all {name}s')
warnings.append(f'{name.title()} arguments given without specifying name. '
f'The arguments will be given to {where or f"all {name}s"}')
return True
return False
report_args_compat('external downloader', opts.external_downloader_args, 'default')
if report_args_compat('external downloader', opts.external_downloader_args,
'default', where=default_downloader) and default_downloader:
# Compat with youtube-dl's behavior. See https://github.com/ytdl-org/youtube-dl/commit/49c5293014bc11ec8c009856cd63cffa6296c1e1
opts.external_downloader_args.setdefault(default_downloader, opts.external_downloader_args.pop('default'))
if report_args_compat('post-processor', opts.postprocessor_args, 'default-compat', 'default'):
opts.postprocessor_args['default'] = opts.postprocessor_args.pop('default-compat')
opts.postprocessor_args.setdefault('sponskrub', [])
@ -395,6 +450,9 @@ def validate_options(opts):
setattr(opts, opt1, default)
# Conflicting options
report_conflict('--playlist-reverse', 'playlist_reverse', '--playlist-random', 'playlist_random')
report_conflict('--playlist-reverse', 'playlist_reverse', '--lazy-playlist', 'lazy_playlist')
report_conflict('--playlist-random', 'playlist_random', '--lazy-playlist', 'lazy_playlist')
report_conflict('--dateafter', 'dateafter', '--date', 'date', default=None)
report_conflict('--datebefore', 'datebefore', '--date', 'date', default=None)
report_conflict('--exec-before-download', 'exec_before_dl_cmd',
@ -627,7 +685,7 @@ def parse_options(argv=None):
final_ext = (
opts.recodevideo if opts.recodevideo in FFmpegVideoConvertorPP.SUPPORTED_EXTS
else opts.remuxvideo if opts.remuxvideo in FFmpegVideoRemuxerPP.SUPPORTED_EXTS
else opts.audioformat if (opts.extractaudio and opts.audioformat != 'best')
else opts.audioformat if (opts.extractaudio and opts.audioformat in FFmpegExtractAudioPP.SUPPORTED_EXTS)
else None)
return parser, opts, urls, {
@ -686,6 +744,7 @@ def parse_options(argv=None):
'file_access_retries': opts.file_access_retries,
'fragment_retries': opts.fragment_retries,
'extractor_retries': opts.extractor_retries,
'retry_sleep_functions': opts.retry_sleep,
'skip_unavailable_fragments': opts.skip_unavailable_fragments,
'keep_fragments': opts.keep_fragments,
'concurrent_fragment_downloads': opts.concurrent_fragment_downloads,
@ -700,6 +759,7 @@ def parse_options(argv=None):
'playlistend': opts.playlistend,
'playlistreverse': opts.playlist_reverse,
'playlistrandom': opts.playlist_random,
'lazy_playlist': opts.lazy_playlist,
'noplaylist': opts.noplaylist,
'logtostderr': opts.outtmpl.get('default') == '-',
'consoletitle': opts.consoletitle,
@ -731,6 +791,7 @@ def parse_options(argv=None):
'verbose': opts.verbose,
'dump_intermediate_pages': opts.dump_intermediate_pages,
'write_pages': opts.write_pages,
'load_pages': opts.load_pages,
'test': opts.test,
'keepvideo': opts.keepvideo,
'min_filesize': opts.min_filesize,
@ -779,6 +840,8 @@ def parse_options(argv=None):
'max_sleep_interval': opts.max_sleep_interval,
'sleep_interval_subtitles': opts.sleep_interval_subtitles,
'external_downloader': opts.external_downloader,
'download_ranges': opts.download_ranges,
'force_keyframes_at_cuts': opts.force_keyframes_at_cuts,
'list_thumbnails': opts.list_thumbnails,
'playlist_items': opts.playlist_items,
'xattr_set_filesize': opts.xattr_set_filesize,
@ -810,62 +873,63 @@ def _real_main(argv=None):
if opts.dump_user_agent:
ua = traverse_obj(opts.headers, 'User-Agent', casesense=False, default=std_headers['User-Agent'])
write_string(f'{ua}\n', out=sys.stdout)
sys.exit(0)
return
if print_extractor_information(opts, all_urls):
sys.exit(0)
return
with YoutubeDL(ydl_opts) as ydl:
pre_process = opts.update_self or opts.rm_cachedir
actual_use = all_urls or opts.load_info_filename
# Remove cache dir
if opts.rm_cachedir:
ydl.cache.remove()
# Update version
if opts.update_self:
# If updater returns True, exit. Required for windows
if run_update(ydl):
if actual_use:
sys.exit('ERROR: The program must exit for the update to complete')
sys.exit()
updater = Updater(ydl)
if opts.update_self and updater.update() and actual_use:
if updater.cmd:
return updater.restart()
# This code is reachable only for zip variant in py < 3.10
# It makes sense to exit here, but the old behavior is to continue
ydl.report_warning('Restart yt-dlp to use the updated version')
# return 100, 'ERROR: The program must exit for the update to complete'
# Maybe do nothing
if not actual_use:
if opts.update_self or opts.rm_cachedir:
sys.exit()
if pre_process:
return ydl._download_retcode
ydl.warn_if_short_id(sys.argv[1:] if argv is None else argv)
parser.error(
'You must provide at least one URL.\n'
'Type yt-dlp --help to see a list of all options.')
parser.destroy()
try:
if opts.load_info_filename is not None:
retcode = ydl.download_with_info_file(expand_path(opts.load_info_filename))
return ydl.download_with_info_file(expand_path(opts.load_info_filename))
else:
retcode = ydl.download(all_urls)
return ydl.download(all_urls)
except DownloadCancelled:
ydl.to_screen('Aborting remaining downloads')
retcode = 101
sys.exit(retcode)
return 101
def main(argv=None):
try:
_real_main(argv)
_exit(*variadic(_real_main(argv)))
except DownloadError:
sys.exit(1)
_exit(1)
except SameFileError as e:
sys.exit(f'ERROR: {e}')
_exit(f'ERROR: {e}')
except KeyboardInterrupt:
sys.exit('\nERROR: Interrupted by user')
_exit('\nERROR: Interrupted by user')
except BrokenPipeError as e:
# https://docs.python.org/3/library/signal.html#note-on-sigpipe
devnull = os.open(os.devnull, os.O_WRONLY)
os.dup2(devnull, sys.stdout.fileno())
sys.exit(f'\nERROR: {e}')
_exit(f'\nERROR: {e}')
except optparse.OptParseError as e:
_exit(2, f'\n{e}')
from .extractor import gen_extractors, list_extractors

View File

@ -1,6 +1,4 @@
import contextlib
import os
import subprocess
import sys
import warnings
import xml.etree.ElementTree as etree
@ -11,8 +9,13 @@ from .compat_utils import passthrough_module
# XXX: Implement this the same way as other DeprecationWarnings without circular import
passthrough_module(__name__, '._legacy', callback=lambda attr: warnings.warn(
try:
passthrough_module(__name__, '._legacy', callback=lambda attr: warnings.warn(
DeprecationWarning(f'{__name__}.{attr} is deprecated'), stacklevel=2))
HAS_LEGACY = True
except ModuleNotFoundError:
# Keep working even without _legacy module
HAS_LEGACY = False
del passthrough_module
@ -52,7 +55,7 @@ if compat_os_name == 'nt' and sys.version_info < (3, 8):
def compat_realpath(path):
while os.path.islink(path):
path = os.path.abspath(os.readlink(path))
return path
return os.path.realpath(path)
else:
compat_realpath = os.path.realpath
@ -74,17 +77,3 @@ if compat_os_name in ('nt', 'ce'):
return userhome + path[i:]
else:
compat_expanduser = os.path.expanduser
WINDOWS_VT_MODE = False if compat_os_name == 'nt' else None
def windows_enable_vt_mode(): # TODO: Do this the proper way https://bugs.python.org/issue30075
if compat_os_name != 'nt':
return
global WINDOWS_VT_MODE
startupinfo = subprocess.STARTUPINFO()
startupinfo.dwFlags |= subprocess.STARTF_USESHOWWINDOW
with contextlib.suppress(Exception):
subprocess.Popen('', shell=True, startupinfo=startupinfo).wait()
WINDOWS_VT_MODE = True

View File

@ -55,3 +55,10 @@ compat_xml_parse_error = etree.ParseError
compat_xpath = lambda xpath: xpath
compat_zip = zip
workaround_optparse_bug9161 = lambda: None
def __getattr__(name):
if name in ('WINDOWS_VT_MODE', 'windows_enable_vt_mode'):
from .. import utils
return getattr(utils, name)
raise AttributeError(name)

View File

@ -33,7 +33,7 @@ def _is_package(module):
def passthrough_module(parent, child, *, callback=lambda _: None):
parent_module = importlib.import_module(parent)
child_module = importlib.import_module(child, parent)
child_module = None # Import child module only as needed
class PassthroughModule(types.ModuleType):
def __getattr__(self, attr):
@ -41,6 +41,9 @@ def passthrough_module(parent, child, *, callback=lambda _: None):
with contextlib.suppress(ImportError):
return importlib.import_module(f'.{attr}', parent)
nonlocal child_module
child_module = child_module or importlib.import_module(child, parent)
ret = _NO_ATTRIBUTE
with contextlib.suppress(AttributeError):
ret = getattr(child_module, attr)

View File

@ -0,0 +1,26 @@
# flake8: noqa: F405
from functools import * # noqa: F403
from .compat_utils import passthrough_module
passthrough_module(__name__, 'functools')
del passthrough_module
try:
cache # >= 3.9
except NameError:
cache = lru_cache(maxsize=None)
try:
cached_property # >= 3.8
except NameError:
class cached_property:
def __init__(self, func):
update_wrapper(self, func)
self.func = func
def __get__(self, instance, _):
if instance is None:
return self
setattr(instance, self.func.__name__, self.func(instance))
return getattr(instance, self.func.__name__)

View File

@ -156,30 +156,16 @@ def _extract_firefox_cookies(profile, logger):
def _firefox_browser_dir():
if sys.platform in ('linux', 'linux2'):
return os.path.expanduser('~/.mozilla/firefox')
elif sys.platform == 'win32':
if sys.platform in ('cygwin', 'win32'):
return os.path.expandvars(R'%APPDATA%\Mozilla\Firefox\Profiles')
elif sys.platform == 'darwin':
return os.path.expanduser('~/Library/Application Support/Firefox')
else:
raise ValueError(f'unsupported platform: {sys.platform}')
return os.path.expanduser('~/.mozilla/firefox')
def _get_chromium_based_browser_settings(browser_name):
# https://chromium.googlesource.com/chromium/src/+/HEAD/docs/user_data_dir.md
if sys.platform in ('linux', 'linux2'):
config = _config_home()
browser_dir = {
'brave': os.path.join(config, 'BraveSoftware/Brave-Browser'),
'chrome': os.path.join(config, 'google-chrome'),
'chromium': os.path.join(config, 'chromium'),
'edge': os.path.join(config, 'microsoft-edge'),
'opera': os.path.join(config, 'opera'),
'vivaldi': os.path.join(config, 'vivaldi'),
}[browser_name]
elif sys.platform == 'win32':
if sys.platform in ('cygwin', 'win32'):
appdata_local = os.path.expandvars('%LOCALAPPDATA%')
appdata_roaming = os.path.expandvars('%APPDATA%')
browser_dir = {
@ -203,7 +189,15 @@ def _get_chromium_based_browser_settings(browser_name):
}[browser_name]
else:
raise ValueError(f'unsupported platform: {sys.platform}')
config = _config_home()
browser_dir = {
'brave': os.path.join(config, 'BraveSoftware/Brave-Browser'),
'chrome': os.path.join(config, 'google-chrome'),
'chromium': os.path.join(config, 'chromium'),
'edge': os.path.join(config, 'microsoft-edge'),
'opera': os.path.join(config, 'opera'),
'vivaldi': os.path.join(config, 'vivaldi'),
}[browser_name]
# Linux keyring names can be determined by snooping on dbus while opening the browser in KDE:
# dbus-monitor "interface='org.kde.KWallet'" "type=method_return"
@ -343,14 +337,11 @@ class ChromeCookieDecryptor:
def get_cookie_decryptor(browser_root, browser_keyring_name, logger, *, keyring=None):
if sys.platform in ('linux', 'linux2'):
return LinuxChromeCookieDecryptor(browser_keyring_name, logger, keyring=keyring)
elif sys.platform == 'darwin':
if sys.platform == 'darwin':
return MacChromeCookieDecryptor(browser_keyring_name, logger)
elif sys.platform == 'win32':
elif sys.platform in ('win32', 'cygwin'):
return WindowsChromeCookieDecryptor(browser_root, logger)
else:
raise NotImplementedError(f'Chrome cookie decryption is not supported on this platform: {sys.platform}')
return LinuxChromeCookieDecryptor(browser_keyring_name, logger, keyring=keyring)
class LinuxChromeCookieDecryptor(ChromeCookieDecryptor):
@ -718,21 +709,19 @@ def _get_kwallet_network_wallet(logger):
"""
default_wallet = 'kdewallet'
try:
proc = Popen([
stdout, _, returncode = Popen.run([
'dbus-send', '--session', '--print-reply=literal',
'--dest=org.kde.kwalletd5',
'/modules/kwalletd5',
'org.kde.KWallet.networkWallet'
], stdout=subprocess.PIPE, stderr=subprocess.DEVNULL)
], text=True, stdout=subprocess.PIPE, stderr=subprocess.DEVNULL)
stdout, stderr = proc.communicate_or_kill()
if proc.returncode != 0:
if returncode:
logger.warning('failed to read NetworkWallet')
return default_wallet
else:
network_wallet = stdout.decode().strip()
logger.debug(f'NetworkWallet = "{network_wallet}"')
return network_wallet
logger.debug(f'NetworkWallet = "{stdout.strip()}"')
return stdout.strip()
except Exception as e:
logger.warning(f'exception while obtaining NetworkWallet: {e}')
return default_wallet
@ -750,17 +739,16 @@ def _get_kwallet_password(browser_keyring_name, logger):
network_wallet = _get_kwallet_network_wallet(logger)
try:
proc = Popen([
stdout, _, returncode = Popen.run([
'kwallet-query',
'--read-password', f'{browser_keyring_name} Safe Storage',
'--folder', f'{browser_keyring_name} Keys',
network_wallet
], stdout=subprocess.PIPE, stderr=subprocess.DEVNULL)
stdout, stderr = proc.communicate_or_kill()
if proc.returncode != 0:
logger.error(f'kwallet-query failed with return code {proc.returncode}. Please consult '
'the kwallet-query man page for details')
if returncode:
logger.error(f'kwallet-query failed with return code {returncode}. '
'Please consult the kwallet-query man page for details')
return b''
else:
if stdout.lower().startswith(b'failed to read'):
@ -775,9 +763,7 @@ def _get_kwallet_password(browser_keyring_name, logger):
return b''
else:
logger.debug('password found')
if stdout[-1:] == b'\n':
stdout = stdout[:-1]
return stdout
return stdout.rstrip(b'\n')
except Exception as e:
logger.warning(f'exception running kwallet-query: {error_to_str(e)}')
return b''
@ -824,17 +810,13 @@ def _get_linux_keyring_password(browser_keyring_name, keyring, logger):
def _get_mac_keyring_password(browser_keyring_name, logger):
logger.debug('using find-generic-password to obtain password from OSX keychain')
try:
proc = Popen(
stdout, _, _ = Popen.run(
['security', 'find-generic-password',
'-w', # write password to stdout
'-a', browser_keyring_name, # match 'account'
'-s', f'{browser_keyring_name} Safe Storage'], # match 'service'
stdout=subprocess.PIPE, stderr=subprocess.DEVNULL)
stdout, stderr = proc.communicate_or_kill()
if stdout[-1:] == b'\n':
stdout = stdout[:-1]
return stdout
return stdout.rstrip(b'\n')
except Exception as e:
logger.warning(f'exception running find-generic-password: {error_to_str(e)}')
return None

View File

@ -1,4 +1,3 @@
from ..compat import compat_str
from ..utils import NO_DEFAULT, determine_protocol
@ -85,13 +84,13 @@ def _get_suitable_downloader(info_dict, protocol, params, default):
if default is NO_DEFAULT:
default = HttpFD
# if (info_dict.get('start_time') or info_dict.get('end_time')) and not info_dict.get('requested_formats') and FFmpegFD.can_download(info_dict):
# return FFmpegFD
if (info_dict.get('section_start') or info_dict.get('section_end')) and FFmpegFD.can_download(info_dict):
return FFmpegFD
info_dict['protocol'] = protocol
downloaders = params.get('external_downloader')
external_downloader = (
downloaders if isinstance(downloaders, compat_str) or downloaders is None
downloaders if isinstance(downloaders, str) or downloaders is None
else downloaders.get(shorten_protocol_name(protocol, True), downloaders.get('default')))
if external_downloader is None:

View File

@ -15,14 +15,18 @@ from ..utils import (
NUMBER_RE,
LockingUnsupportedError,
Namespace,
classproperty,
decodeArgument,
encodeFilename,
error_to_compat_str,
float_or_none,
format_bytes,
join_nonempty,
sanitize_open,
shell_quote,
timeconvert,
timetuple_from_msec,
try_call,
)
@ -41,6 +45,7 @@ class FileDownloader:
verbose: Print additional info to stdout.
quiet: Do not print messages to stdout.
ratelimit: Download speed limit, in bytes/sec.
continuedl: Attempt to continue downloads if possible
throttledratelimit: Assume the download is being throttled below this speed (bytes/sec)
retries: Number of times to retry for HTTP error 5xx
file_access_retries: Number of times to retry on file access error
@ -64,6 +69,7 @@ class FileDownloader:
useful for bypassing bandwidth throttling imposed by
a webserver (experimental)
progress_template: See YoutubeDL.py
retry_sleep_functions: See YoutubeDL.py
Subclasses of this one must re-define the real_download method.
"""
@ -98,12 +104,16 @@ class FileDownloader:
def to_screen(self, *args, **kargs):
self.ydl.to_screen(*args, quiet=self.params.get('quiet'), **kargs)
@property
def FD_NAME(self):
return re.sub(r'(?<!^)(?=[A-Z])', '_', type(self).__name__[:-2]).lower()
__to_screen = to_screen
@classproperty
def FD_NAME(cls):
return re.sub(r'(?<=[a-z])(?=[A-Z])', '_', cls.__name__[:-2]).lower()
@staticmethod
def format_seconds(seconds):
if seconds is None:
return ' Unknown'
time = timetuple_from_msec(seconds * 1000)
if time.hours > 99:
return '--:--:--'
@ -111,6 +121,8 @@ class FileDownloader:
return '%02d:%02d' % time[1:-1]
return '%02d:%02d:%02d' % time[:-1]
format_eta = format_seconds
@staticmethod
def calc_percent(byte_counter, data_len):
if data_len is None:
@ -119,11 +131,7 @@ class FileDownloader:
@staticmethod
def format_percent(percent):
if percent is None:
return '---.-%'
elif percent == 100:
return '100%'
return '%6s' % ('%3.1f%%' % percent)
return ' N/A%' if percent is None else f'{percent:>5.1f}%'
@staticmethod
def calc_eta(start, now, total, current):
@ -137,12 +145,6 @@ class FileDownloader:
rate = float(current) / dif
return int((float(total) - float(current)) / rate)
@staticmethod
def format_eta(eta):
if eta is None:
return '--:--'
return FileDownloader.format_seconds(eta)
@staticmethod
def calc_speed(start, now, bytes):
dif = now - start
@ -152,13 +154,11 @@ class FileDownloader:
@staticmethod
def format_speed(speed):
if speed is None:
return '%10s' % '---b/s'
return '%10s' % ('%s/s' % format_bytes(speed))
return ' Unknown B/s' if speed is None else f'{format_bytes(speed):>10s}/s'
@staticmethod
def format_retries(retries):
return 'inf' if retries == float('inf') else '%.0f' % retries
return 'inf' if retries == float('inf') else int(retries)
@staticmethod
def best_block_size(elapsed_time, bytes):
@ -232,6 +232,7 @@ class FileDownloader:
self.to_screen(
f'[download] Unable to {action} file due to file access error. '
f'Retrying (attempt {retry} of {self.format_retries(file_access_retries)}) ...')
if not self.sleep_retry('file_access', retry):
time.sleep(0.01)
return inner
return outer
@ -282,9 +283,9 @@ class FileDownloader:
elif self.ydl.params.get('logger'):
self._multiline = MultilineLogger(self.ydl.params['logger'], lines)
elif self.params.get('progress_with_newline'):
self._multiline = BreaklineStatusPrinter(self.ydl._out_files.screen, lines)
self._multiline = BreaklineStatusPrinter(self.ydl._out_files.out, lines)
else:
self._multiline = MultilinePrinter(self.ydl._out_files.screen, lines, not self.params.get('quiet'))
self._multiline = MultilinePrinter(self.ydl._out_files.out, lines, not self.params.get('quiet'))
self._multiline.allow_colors = self._multiline._HAVE_FULLCAP and not self.params.get('no_color')
def _finish_multiline_status(self):
@ -301,7 +302,7 @@ class FileDownloader:
)
def _report_progress_status(self, s, default_template):
for name, style in self.ProgressStyles:
for name, style in self.ProgressStyles.items_:
name = f'_{name}_str'
if name not in s:
continue
@ -325,63 +326,52 @@ class FileDownloader:
self._multiline.stream, self._multiline.allow_colors, *args, **kwargs)
def report_progress(self, s):
def with_fields(*tups, default=''):
for *fields, tmpl in tups:
if all(s.get(f) is not None for f in fields):
return tmpl
return default
if s['status'] == 'finished':
if self.params.get('noprogress'):
self.to_screen('[download] Download completed')
msg_template = '100%%'
if s.get('total_bytes') is not None:
s['_total_bytes_str'] = format_bytes(s['total_bytes'])
msg_template += ' of %(_total_bytes_str)s'
if s.get('elapsed') is not None:
s['_elapsed_str'] = self.format_seconds(s['elapsed'])
msg_template += ' in %(_elapsed_str)s'
s['_percent_str'] = self.format_percent(100)
self._report_progress_status(s, msg_template)
return
s.update({
'_total_bytes_str': format_bytes(s.get('total_bytes')),
'_elapsed_str': self.format_seconds(s.get('elapsed')),
'_percent_str': self.format_percent(100),
})
self._report_progress_status(s, join_nonempty(
'100%%',
with_fields(('total_bytes', 'of %(_total_bytes_str)s')),
with_fields(('elapsed', 'in %(_elapsed_str)s')),
delim=' '))
if s['status'] != 'downloading':
return
if s.get('eta') is not None:
s['_eta_str'] = self.format_eta(s['eta'])
else:
s['_eta_str'] = 'Unknown'
s.update({
'_eta_str': self.format_eta(s.get('eta')),
'_speed_str': self.format_speed(s.get('speed')),
'_percent_str': self.format_percent(try_call(
lambda: 100 * s['downloaded_bytes'] / s['total_bytes'],
lambda: 100 * s['downloaded_bytes'] / s['total_bytes_estimate'],
lambda: s['downloaded_bytes'] == 0 and 0)),
'_total_bytes_str': format_bytes(s.get('total_bytes')),
'_total_bytes_estimate_str': format_bytes(s.get('total_bytes_estimate')),
'_downloaded_bytes_str': format_bytes(s.get('downloaded_bytes')),
'_elapsed_str': self.format_seconds(s.get('elapsed')),
})
if s.get('total_bytes') and s.get('downloaded_bytes') is not None:
s['_percent_str'] = self.format_percent(100 * s['downloaded_bytes'] / s['total_bytes'])
elif s.get('total_bytes_estimate') and s.get('downloaded_bytes') is not None:
s['_percent_str'] = self.format_percent(100 * s['downloaded_bytes'] / s['total_bytes_estimate'])
else:
if s.get('downloaded_bytes') == 0:
s['_percent_str'] = self.format_percent(0)
else:
s['_percent_str'] = 'Unknown %'
msg_template = with_fields(
('total_bytes', '%(_percent_str)s of %(_total_bytes_str)s at %(_speed_str)s ETA %(_eta_str)s'),
('total_bytes_estimate', '%(_percent_str)s of ~%(_total_bytes_estimate_str)s at %(_speed_str)s ETA %(_eta_str)s'),
('downloaded_bytes', 'elapsed', '%(_downloaded_bytes_str)s at %(_speed_str)s (%(_elapsed_str)s)'),
('downloaded_bytes', '%(_downloaded_bytes_str)s at %(_speed_str)s'),
default='%(_percent_str)s at %(_speed_str)s ETA %(_eta_str)s')
if s.get('speed') is not None:
s['_speed_str'] = self.format_speed(s['speed'])
else:
s['_speed_str'] = 'Unknown speed'
if s.get('total_bytes') is not None:
s['_total_bytes_str'] = format_bytes(s['total_bytes'])
msg_template = '%(_percent_str)s of %(_total_bytes_str)s at %(_speed_str)s ETA %(_eta_str)s'
elif s.get('total_bytes_estimate') is not None:
s['_total_bytes_estimate_str'] = format_bytes(s['total_bytes_estimate'])
msg_template = '%(_percent_str)s of ~%(_total_bytes_estimate_str)s at %(_speed_str)s ETA %(_eta_str)s'
else:
if s.get('downloaded_bytes') is not None:
s['_downloaded_bytes_str'] = format_bytes(s['downloaded_bytes'])
if s.get('elapsed'):
s['_elapsed_str'] = self.format_seconds(s['elapsed'])
msg_template = '%(_downloaded_bytes_str)s at %(_speed_str)s (%(_elapsed_str)s)'
else:
msg_template = '%(_downloaded_bytes_str)s at %(_speed_str)s'
else:
msg_template = '%(_percent_str)s at %(_speed_str)s ETA %(_eta_str)s'
if s.get('fragment_index') and s.get('fragment_count'):
msg_template += ' (frag %(fragment_index)s/%(fragment_count)s)'
elif s.get('fragment_index'):
msg_template += ' (frag %(fragment_index)s)'
msg_template += with_fields(
('fragment_index', 'fragment_count', ' (frag %(fragment_index)s/%(fragment_count)s)'),
('fragment_index', ' (frag %(fragment_index)s)'))
self._report_progress_status(s, msg_template)
def report_resuming_byte(self, resume_len):
@ -390,14 +380,23 @@ class FileDownloader:
def report_retry(self, err, count, retries):
"""Report retry in case of HTTP error 5xx"""
self.to_screen(
self.__to_screen(
'[download] Got server HTTP error: %s. Retrying (attempt %d of %s) ...'
% (error_to_compat_str(err), count, self.format_retries(retries)))
self.sleep_retry('http', count)
def report_unable_to_resume(self):
"""Report it was impossible to resume download."""
self.to_screen('[download] Unable to resume')
def sleep_retry(self, retry_type, count):
sleep_func = self.params.get('retry_sleep_functions', {}).get(retry_type)
delay = float_or_none(sleep_func(n=count - 1)) if sleep_func else None
if delay:
self.__to_screen(f'Sleeping {delay:.2f} seconds ...')
time.sleep(delay)
return sleep_func is not None
@staticmethod
def supports_manifest(manifest):
""" Whether the downloader can download the fragments from the manifest.

View File

@ -1,7 +1,7 @@
import time
from . import get_suitable_downloader
from .fragment import FragmentFD
from ..downloader import get_suitable_downloader
from ..utils import urljoin
@ -73,6 +73,7 @@ class DashSegmentsFD(FragmentFD):
yield {
'frag_index': frag_index,
'fragment_count': fragment.get('fragment_count'),
'index': i,
'url': fragment_url,
}

View File

@ -1,3 +1,4 @@
import enum
import os.path
import re
import subprocess
@ -5,7 +6,8 @@ import sys
import time
from .fragment import FragmentFD
from ..compat import compat_setenv, compat_str
from ..compat import functools # isort: split
from ..compat import compat_setenv
from ..postprocessor.ffmpeg import EXT_TO_OUT_FORMATS, FFmpegPostProcessor
from ..utils import (
Popen,
@ -24,9 +26,15 @@ from ..utils import (
)
class Features(enum.Enum):
TO_STDOUT = enum.auto()
MULTIPLE_FORMATS = enum.auto()
class ExternalFD(FragmentFD):
SUPPORTED_PROTOCOLS = ('http', 'https', 'ftp', 'ftps')
can_download_to_stdout = False
SUPPORTED_FEATURES = ()
_CAPTURE_STDERR = True
def real_download(self, filename, info_dict):
self.report_destination(filename)
@ -74,7 +82,7 @@ class ExternalFD(FragmentFD):
def EXE_NAME(cls):
return cls.get_basename()
@property
@functools.cached_property
def exe(self):
return self.EXE_NAME
@ -90,9 +98,11 @@ class ExternalFD(FragmentFD):
@classmethod
def supports(cls, info_dict):
return (
(cls.can_download_to_stdout or not info_dict.get('to_stdout'))
and info_dict['protocol'] in cls.SUPPORTED_PROTOCOLS)
return all((
not info_dict.get('to_stdout') or Features.TO_STDOUT in cls.SUPPORTED_FEATURES,
'+' not in info_dict['protocol'] or Features.MULTIPLE_FORMATS in cls.SUPPORTED_FEATURES,
all(proto in cls.SUPPORTED_PROTOCOLS for proto in info_dict['protocol'].split('+')),
))
@classmethod
def can_download(cls, info_dict, path=None):
@ -119,29 +129,31 @@ class ExternalFD(FragmentFD):
self._debug_cmd(cmd)
if 'fragments' not in info_dict:
p = Popen(cmd, stderr=subprocess.PIPE)
_, stderr = p.communicate_or_kill()
if p.returncode != 0:
self.to_stderr(stderr.decode('utf-8', 'replace'))
return p.returncode
_, stderr, returncode = Popen.run(
cmd, text=True, stderr=subprocess.PIPE if self._CAPTURE_STDERR else None)
if returncode and stderr:
self.to_stderr(stderr)
return returncode
fragment_retries = self.params.get('fragment_retries', 0)
skip_unavailable_fragments = self.params.get('skip_unavailable_fragments', True)
count = 0
while count <= fragment_retries:
p = Popen(cmd, stderr=subprocess.PIPE)
_, stderr = p.communicate_or_kill()
if p.returncode == 0:
_, stderr, returncode = Popen.run(cmd, text=True, stderr=subprocess.PIPE)
if not returncode:
break
# TODO: Decide whether to retry based on error code
# https://aria2.github.io/manual/en/html/aria2c.html#exit-status
self.to_stderr(stderr.decode('utf-8', 'replace'))
if stderr:
self.to_stderr(stderr)
count += 1
if count <= fragment_retries:
self.to_screen(
'[%s] Got error. Retrying fragments (attempt %d of %s)...'
% (self.get_basename(), count, self.format_retries(fragment_retries)))
self.sleep_retry('fragment', count)
if count > fragment_retries:
if not skip_unavailable_fragments:
self.report_error('Giving up after %s fragment retries' % fragment_retries)
@ -170,6 +182,7 @@ class ExternalFD(FragmentFD):
class CurlFD(ExternalFD):
AVAILABLE_OPT = '-V'
_CAPTURE_STDERR = False # curl writes the progress to stderr
def _make_cmd(self, tmpfilename, info_dict):
cmd = [self.exe, '--location', '-o', tmpfilename, '--compressed']
@ -194,16 +207,6 @@ class CurlFD(ExternalFD):
cmd += ['--', info_dict['url']]
return cmd
def _call_downloader(self, tmpfilename, info_dict):
cmd = [encodeArgument(a) for a in self._make_cmd(tmpfilename, info_dict)]
self._debug_cmd(cmd)
# curl writes the progress to stderr so don't capture it.
p = Popen(cmd)
p.communicate_or_kill()
return p.returncode
class AxelFD(ExternalFD):
AVAILABLE_OPT = '-V'
@ -322,7 +325,7 @@ class HttpieFD(ExternalFD):
class FFmpegFD(ExternalFD):
SUPPORTED_PROTOCOLS = ('http', 'https', 'ftp', 'ftps', 'm3u8', 'm3u8_native', 'rtsp', 'rtmp', 'rtmp_ffmpeg', 'mms', 'http_dash_segments')
can_download_to_stdout = True
SUPPORTED_FEATURES = (Features.TO_STDOUT, Features.MULTIPLE_FORMATS)
@classmethod
def available(cls, path=None):
@ -330,10 +333,6 @@ class FFmpegFD(ExternalFD):
# Fixme: This may be wrong when --ffmpeg-location is used
return FFmpegPostProcessor().available
@classmethod
def supports(cls, info_dict):
return all(proto in cls.SUPPORTED_PROTOCOLS for proto in info_dict['protocol'].split('+'))
def on_process_started(self, proc, stdin):
""" Override this in subclasses """
pass
@ -378,13 +377,6 @@ class FFmpegFD(ExternalFD):
# http://trac.ffmpeg.org/ticket/6125#comment:10
args += ['-seekable', '1' if seekable else '0']
# start_time = info_dict.get('start_time') or 0
# if start_time:
# args += ['-ss', compat_str(start_time)]
# end_time = info_dict.get('end_time')
# if end_time:
# args += ['-t', compat_str(end_time - start_time)]
http_headers = None
if info_dict.get('http_headers'):
youtubedl_headers = handle_youtubedl_headers(info_dict['http_headers'])
@ -442,25 +434,31 @@ class FFmpegFD(ExternalFD):
if isinstance(conn, list):
for entry in conn:
args += ['-rtmp_conn', entry]
elif isinstance(conn, compat_str):
elif isinstance(conn, str):
args += ['-rtmp_conn', conn]
start_time, end_time = info_dict.get('section_start') or 0, info_dict.get('section_end')
for i, url in enumerate(urls):
# We need to specify headers for each http input stream
# otherwise, it will only be applied to the first.
# https://github.com/yt-dlp/yt-dlp/issues/2696
if http_headers is not None and re.match(r'^https?://', url):
args += http_headers
if start_time:
args += ['-ss', str(start_time)]
if end_time:
args += ['-t', str(end_time - start_time)]
args += self._configuration_args((f'_i{i + 1}', '_i')) + ['-i', url]
if not (start_time or end_time) or not self.params.get('force_keyframes_at_cuts'):
args += ['-c', 'copy']
if info_dict.get('requested_formats') or protocol == 'http_dash_segments':
for (i, fmt) in enumerate(info_dict.get('requested_formats') or [info_dict]):
stream_number = fmt.get('manifest_stream_number', 0)
args.extend(['-map', f'{i}:{stream_number}'])
if self.params.get('test', False):
args += ['-fs', compat_str(self._TEST_FILE_SIZE)]
args += ['-fs', str(self._TEST_FILE_SIZE)]
ext = info_dict['ext']
if protocol in ('m3u8', 'm3u8_native'):
@ -495,7 +493,7 @@ class FFmpegFD(ExternalFD):
args.append(encodeFilename(ffpp._ffmpeg_filename_argument(tmpfilename), True))
self._debug_cmd(args)
proc = Popen(args, stdin=subprocess.PIPE, env=env)
with Popen(args, stdin=subprocess.PIPE, env=env) as proc:
if url in ('-', 'pipe:'):
self.on_process_started(proc, proc.stdin)
try:
@ -509,8 +507,7 @@ class FFmpegFD(ExternalFD):
if isinstance(e, KeyboardInterrupt) and sys.platform != 'win32' and url not in ('-', 'pipe:'):
proc.communicate_or_kill(b'q')
else:
proc.kill()
proc.wait()
proc.kill(timeout=None)
raise
return retval

View File

@ -391,9 +391,10 @@ class F4mFD(FragmentFD):
query.append(info_dict['extra_param_to_segment_url'])
url_parsed = base_url_parsed._replace(path=base_url_parsed.path + name, query='&'.join(query))
try:
success, down_data = self._download_fragment(ctx, url_parsed.geturl(), info_dict)
success = self._download_fragment(ctx, url_parsed.geturl(), info_dict)
if not success:
return False
down_data = self._read_fragment(ctx)
reader = FlvReader(down_data)
while True:
try:

View File

@ -23,11 +23,7 @@ class HttpQuietDownloader(HttpFD):
def to_screen(self, *args, **kargs):
pass
console_title = to_screen
def report_retry(self, err, count, retries):
super().to_screen(
f'[download] Got server HTTP error: {err}. Retrying (attempt {count} of {self.format_retries(retries)}) ...')
to_console_title = to_screen
class FragmentFD(FileDownloader):
@ -70,6 +66,7 @@ class FragmentFD(FileDownloader):
self.to_screen(
'\r[download] Got server HTTP error: %s. Retrying fragment %d (attempt %d of %s) ...'
% (error_to_compat_str(err), frag_index, count, self.format_retries(retries)))
self.sleep_retry('fragment', count)
def report_skip_fragment(self, frag_index, err=None):
err = f' {err};' if err else ''
@ -168,18 +165,11 @@ class FragmentFD(FileDownloader):
total_frags_str = 'unknown (live)'
self.to_screen(f'[{self.FD_NAME}] Total fragments: {total_frags_str}')
self.report_destination(ctx['filename'])
dl = HttpQuietDownloader(
self.ydl,
{
'continuedl': self.params.get('continuedl', True),
'quiet': self.params.get('quiet'),
dl = HttpQuietDownloader(self.ydl, {
**self.params,
'noprogress': True,
'ratelimit': self.params.get('ratelimit'),
'retries': self.params.get('retries', 0),
'nopart': self.params.get('nopart', False),
'test': False,
}
)
})
tmpfilename = self.temp_name(ctx['filename'])
open_mode = 'wb'
resume_len = 0
@ -252,6 +242,9 @@ class FragmentFD(FileDownloader):
if s['status'] not in ('downloading', 'finished'):
return
if not total_frags and ctx.get('fragment_count'):
state['fragment_count'] = ctx['fragment_count']
if ctx_id is not None and s.get('ctx_id') != ctx_id:
return
@ -460,6 +453,7 @@ class FragmentFD(FileDownloader):
fatal, count = is_fatal(fragment.get('index') or (frag_index - 1)), 0
while count <= fragment_retries:
try:
ctx['fragment_count'] = fragment.get('fragment_count')
if self._download_fragment(ctx, fragment['url'], info_dict, headers):
break
return
@ -506,12 +500,20 @@ class FragmentFD(FileDownloader):
self.report_warning('The download speed shown is only of one thread. This is a known issue and patches are welcome')
with tpe or concurrent.futures.ThreadPoolExecutor(max_workers) as pool:
try:
for fragment, frag_index, frag_filename in pool.map(_download_fragment, fragments):
ctx['fragment_filename_sanitized'] = frag_filename
ctx['fragment_index'] = frag_index
result = append_fragment(decrypt_fragment(fragment, self._read_fragment(ctx)), frag_index, ctx)
if not result:
ctx.update({
'fragment_filename_sanitized': frag_filename,
'fragment_index': frag_index,
})
if not append_fragment(decrypt_fragment(fragment, self._read_fragment(ctx)), frag_index, ctx):
return False
except KeyboardInterrupt:
self._finish_multiline_status()
self.report_error(
'Interrupted by user. Waiting for all threads to shutdown...', is_error=False, tb=False)
pool.shutdown(wait=False)
raise
else:
for fragment in fragments:
if not interrupt_trigger[0]:

View File

@ -2,12 +2,12 @@ import binascii
import io
import re
from . import get_suitable_downloader
from .external import FFmpegFD
from .fragment import FragmentFD
from .. import webvtt
from ..compat import compat_urlparse
from ..dependencies import Cryptodome_AES
from ..downloader import get_suitable_downloader
from ..utils import bug_reports_message, parse_m3u8_attributes, update_url_query

View File

@ -136,16 +136,14 @@ class HttpFD(FileDownloader):
if has_range:
content_range = ctx.data.headers.get('Content-Range')
content_range_start, content_range_end, content_len = parse_http_range(content_range)
if content_range_start is not None and range_start == content_range_start:
# Content-Range is present and matches requested Range, resume is possible
accept_content_len = (
if range_start == content_range_start and (
# Non-chunked download
not ctx.chunk_size
# Chunked download and requested piece or
# its part is promised to be served
or content_range_end == range_end
or content_len < range_end)
if accept_content_len:
or content_len < range_end):
ctx.content_len = content_len
if content_len or req_end:
ctx.data_len = min(content_len or req_end, req_end or content_len) - (req_start or 0)

View File

@ -1,8 +1,7 @@
import threading
from . import get_suitable_downloader
from .common import FileDownloader
from ..downloader import get_suitable_downloader
from ..extractor.niconico import NiconicoIE
from ..utils import sanitized_Request
@ -10,8 +9,9 @@ class NiconicoDmcFD(FileDownloader):
""" Downloading niconico douga from DMC with heartbeat """
def real_download(self, filename, info_dict):
self.to_screen('[%s] Downloading from DMC' % self.FD_NAME)
from ..extractor.niconico import NiconicoIE
self.to_screen('[%s] Downloading from DMC' % self.FD_NAME)
ie = NiconicoIE(self.ydl)
info_dict, heartbeat_info_dict = ie._get_heartbeat_info(info_dict)

View File

@ -92,8 +92,7 @@ class RtmpFD(FileDownloader):
self.to_screen('')
return proc.wait()
except BaseException: # Including KeyboardInterrupt
proc.kill()
proc.wait()
proc.kill(timeout=None)
raise
url = info_dict['url']

View File

@ -3,7 +3,6 @@ import time
from .fragment import FragmentFD
from ..compat import compat_urllib_error
from ..extractor.youtube import YoutubeBaseInfoExtractor as YT_BaseIE
from ..utils import RegexNotFoundError, dict_get, int_or_none, try_get
@ -26,7 +25,9 @@ class YoutubeLiveChatFD(FragmentFD):
'total_frags': None,
}
ie = YT_BaseIE(self.ydl)
from ..extractor.youtube import YoutubeBaseInfoExtractor
ie = YoutubeBaseInfoExtractor(self.ydl)
start_time = int(time.time() * 1000)

View File

@ -1,32 +1,15 @@
import contextlib
import os
from ..compat.compat_utils import passthrough_module
from ..utils import load_plugins
_LAZY_LOADER = False
if not os.environ.get('YTDLP_NO_LAZY_EXTRACTORS'):
with contextlib.suppress(ImportError):
from .lazy_extractors import * # noqa: F403
from .lazy_extractors import _ALL_CLASSES
_LAZY_LOADER = True
if not _LAZY_LOADER:
from .extractors import * # noqa: F403
_ALL_CLASSES = [ # noqa: F811
klass
for name, klass in globals().items()
if name.endswith('IE') and name != 'GenericIE'
]
_ALL_CLASSES.append(GenericIE) # noqa: F405
_PLUGIN_CLASSES = load_plugins('extractor', 'IE', globals())
_ALL_CLASSES = list(_PLUGIN_CLASSES.values()) + _ALL_CLASSES
passthrough_module(__name__, '.extractors')
del passthrough_module
def gen_extractor_classes():
""" Return a list of supported extractors.
The order does matter; the first extractor matched is the one handling the URL.
"""
from .extractors import _ALL_CLASSES
return _ALL_CLASSES
@ -39,10 +22,12 @@ def gen_extractors():
def list_extractor_classes(age_limit=None):
"""Return a list of extractors that are suitable for the given age, sorted by extractor name"""
from .generic import GenericIE
yield from sorted(filter(
lambda ie: ie.is_suitable(age_limit) and ie != GenericIE, # noqa: F405
lambda ie: ie.is_suitable(age_limit) and ie != GenericIE,
gen_extractor_classes()), key=lambda ie: ie.IE_NAME.lower())
yield GenericIE # noqa: F405
yield GenericIE
def list_extractors(age_limit=None):
@ -52,4 +37,6 @@ def list_extractors(age_limit=None):
def get_info_extractor(ie_name):
"""Returns the info extractor class with the given ie_name"""
return globals()[ie_name + 'IE']
from . import extractors
return getattr(extractors, f'{ie_name}IE')

File diff suppressed because it is too large Load Diff

View File

@ -16,7 +16,7 @@ from ..compat import compat_urllib_parse_urlparse, compat_urllib_request
from ..utils import (
ExtractorError,
bytes_to_intlist,
decode_base,
decode_base_n,
int_or_none,
intlist_to_bytes,
request_to_url,
@ -123,7 +123,7 @@ class AbemaLicenseHandler(compat_urllib_request.BaseHandler):
'Content-Type': 'application/json',
})
res = decode_base(license_response['k'], self.STRTABLE)
res = decode_base_n(license_response['k'], table=self.STRTABLE)
encvideokey = bytes_to_intlist(struct.pack('>QQ', res >> 64, res & 0xffffffffffffffff))
h = hmac.new(

View File

@ -1,270 +0,0 @@
from .common import InfoExtractor
from ..utils import (
ExtractorError,
urlencode_postdata,
int_or_none,
str_or_none,
determine_ext,
)
from ..compat import compat_HTTPError
class AnimeLabBaseIE(InfoExtractor):
_LOGIN_URL = 'https://www.animelab.com/login'
_NETRC_MACHINE = 'animelab'
_LOGGED_IN = False
def _is_logged_in(self, login_page=None):
if not self._LOGGED_IN:
if not login_page:
login_page = self._download_webpage(self._LOGIN_URL, None, 'Downloading login page')
AnimeLabBaseIE._LOGGED_IN = 'Sign In' not in login_page
return self._LOGGED_IN
def _perform_login(self, username, password):
if self._is_logged_in():
return
login_form = {
'email': username,
'password': password,
}
try:
response = self._download_webpage(
self._LOGIN_URL, None, 'Logging in', 'Wrong login info',
data=urlencode_postdata(login_form),
headers={'Content-Type': 'application/x-www-form-urlencoded'})
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 400:
raise ExtractorError('Unable to log in (wrong credentials?)', expected=True)
raise
if not self._is_logged_in(response):
raise ExtractorError('Unable to login (cannot verify if logged in)')
def _real_initialize(self):
if not self._is_logged_in():
self.raise_login_required('Login is required to access any AnimeLab content')
class AnimeLabIE(AnimeLabBaseIE):
_VALID_URL = r'https?://(?:www\.)?animelab\.com/player/(?P<id>[^/]+)'
_TEST = {
'url': 'https://www.animelab.com/player/fullmetal-alchemist-brotherhood-episode-42',
'md5': '05bde4b91a5d1ff46ef5b94df05b0f7f',
'info_dict': {
'id': '383',
'ext': 'mp4',
'display_id': 'fullmetal-alchemist-brotherhood-episode-42',
'title': 'Fullmetal Alchemist: Brotherhood - Episode 42 - Signs of a Counteroffensive',
'description': 'md5:103eb61dd0a56d3dfc5dbf748e5e83f4',
'series': 'Fullmetal Alchemist: Brotherhood',
'episode': 'Signs of a Counteroffensive',
'episode_number': 42,
'duration': 1469,
'season': 'Season 1',
'season_number': 1,
'season_id': '38',
},
'params': {
# Ensure the same video is downloaded whether the user is premium or not
'format': '[format_id=21711_yeshardsubbed_ja-JP][height=480]',
},
}
def _real_extract(self, url):
display_id = self._match_id(url)
# unfortunately we can get different URLs for the same formats
# e.g. if we are using a "free" account so no dubs available
# (so _remove_duplicate_formats is not effective)
# so we use a dictionary as a workaround
formats = {}
for language_option_url in ('https://www.animelab.com/player/%s/subtitles',
'https://www.animelab.com/player/%s/dubbed'):
actual_url = language_option_url % display_id
webpage = self._download_webpage(actual_url, display_id, 'Downloading URL ' + actual_url)
video_collection = self._parse_json(self._search_regex(r'new\s+?AnimeLabApp\.VideoCollection\s*?\((.*?)\);', webpage, 'AnimeLab VideoCollection'), display_id)
position = int_or_none(self._search_regex(r'playlistPosition\s*?=\s*?(\d+)', webpage, 'Playlist Position'))
raw_data = video_collection[position]['videoEntry']
video_id = str_or_none(raw_data['id'])
# create a title from many sources (while grabbing other info)
# TODO use more fallback sources to get some of these
series = raw_data.get('showTitle')
video_type = raw_data.get('videoEntryType', {}).get('name')
episode_number = raw_data.get('episodeNumber')
episode_name = raw_data.get('name')
title_parts = (series, video_type, episode_number, episode_name)
if None not in title_parts:
title = '%s - %s %s - %s' % title_parts
else:
title = episode_name
description = raw_data.get('synopsis') or self._og_search_description(webpage, default=None)
duration = int_or_none(raw_data.get('duration'))
thumbnail_data = raw_data.get('images', [])
thumbnails = []
for thumbnail in thumbnail_data:
for instance in thumbnail['imageInstances']:
image_data = instance.get('imageInfo', {})
thumbnails.append({
'id': str_or_none(image_data.get('id')),
'url': image_data.get('fullPath'),
'width': image_data.get('width'),
'height': image_data.get('height'),
})
season_data = raw_data.get('season', {}) or {}
season = str_or_none(season_data.get('name'))
season_number = int_or_none(season_data.get('seasonNumber'))
season_id = str_or_none(season_data.get('id'))
for video_data in raw_data['videoList']:
current_video_list = {}
current_video_list['language'] = video_data.get('language', {}).get('languageCode')
is_hardsubbed = video_data.get('hardSubbed')
for video_instance in video_data['videoInstances']:
httpurl = video_instance.get('httpUrl')
url = httpurl if httpurl else video_instance.get('rtmpUrl')
if url is None:
# this video format is unavailable to the user (not premium etc.)
continue
current_format = current_video_list.copy()
format_id_parts = []
format_id_parts.append(str_or_none(video_instance.get('id')))
if is_hardsubbed is not None:
if is_hardsubbed:
format_id_parts.append('yeshardsubbed')
else:
format_id_parts.append('nothardsubbed')
format_id_parts.append(current_format['language'])
format_id = '_'.join([x for x in format_id_parts if x is not None])
ext = determine_ext(url)
if ext == 'm3u8':
for format_ in self._extract_m3u8_formats(
url, video_id, m3u8_id=format_id, fatal=False):
formats[format_['format_id']] = format_
continue
elif ext == 'mpd':
for format_ in self._extract_mpd_formats(
url, video_id, mpd_id=format_id, fatal=False):
formats[format_['format_id']] = format_
continue
current_format['url'] = url
quality_data = video_instance.get('videoQuality')
if quality_data:
quality = quality_data.get('name') or quality_data.get('description')
else:
quality = None
height = None
if quality:
height = int_or_none(self._search_regex(r'(\d+)p?$', quality, 'Video format height', default=None))
if height is None:
self.report_warning('Could not get height of video')
else:
current_format['height'] = height
current_format['format_id'] = format_id
formats[current_format['format_id']] = current_format
formats = list(formats.values())
self._sort_formats(formats)
return {
'id': video_id,
'display_id': display_id,
'title': title,
'description': description,
'series': series,
'episode': episode_name,
'episode_number': int_or_none(episode_number),
'thumbnails': thumbnails,
'duration': duration,
'formats': formats,
'season': season,
'season_number': season_number,
'season_id': season_id,
}
class AnimeLabShowsIE(AnimeLabBaseIE):
_VALID_URL = r'https?://(?:www\.)?animelab\.com/shows/(?P<id>[^/]+)'
_TEST = {
'url': 'https://www.animelab.com/shows/attack-on-titan',
'info_dict': {
'id': '45',
'title': 'Attack on Titan',
'description': 'md5:989d95a2677e9309368d5cf39ba91469',
},
'playlist_count': 59,
'skip': 'All AnimeLab content requires authentication',
}
def _real_extract(self, url):
_BASE_URL = 'http://www.animelab.com'
_SHOWS_API_URL = '/api/videoentries/show/videos/'
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id, 'Downloading requested URL')
show_data_str = self._search_regex(r'({"id":.*}),\svideoEntry', webpage, 'AnimeLab show data')
show_data = self._parse_json(show_data_str, display_id)
show_id = str_or_none(show_data.get('id'))
title = show_data.get('name')
description = show_data.get('shortSynopsis') or show_data.get('longSynopsis')
entries = []
for season in show_data['seasons']:
season_id = season['id']
get_data = urlencode_postdata({
'seasonId': season_id,
'limit': 1000,
})
# despite using urlencode_postdata, we are sending a GET request
target_url = _BASE_URL + _SHOWS_API_URL + show_id + "?" + get_data.decode('utf-8')
response = self._download_webpage(
target_url,
None, 'Season id %s' % season_id)
season_data = self._parse_json(response, display_id)
for video_data in season_data['list']:
entries.append(self.url_result(
_BASE_URL + '/player/' + video_data['slug'], 'AnimeLab',
str_or_none(video_data.get('id')), video_data.get('name')
))
return {
'_type': 'playlist',
'id': show_id,
'title': title,
'description': description,
'entries': entries,
}
# TODO implement myqueue

View File

@ -442,9 +442,10 @@ class YoutubeWebArchiveIE(InfoExtractor):
'only_matching': True
},
]
_YT_INITIAL_DATA_RE = r'(?:(?:(?:window\s*\[\s*["\']ytInitialData["\']\s*\]|ytInitialData)\s*=\s*({.+?})\s*;)|%s)' % YoutubeBaseInfoExtractor._YT_INITIAL_DATA_RE
_YT_INITIAL_PLAYER_RESPONSE_RE = r'(?:(?:(?:window\s*\[\s*["\']ytInitialPlayerResponse["\']\s*\]|ytInitialPlayerResponse)\s*=[(\s]*({.+?})[)\s]*;)|%s)' % YoutubeBaseInfoExtractor._YT_INITIAL_PLAYER_RESPONSE_RE
_YT_INITIAL_BOUNDARY_RE = r'(?:(?:var\s+meta|</script|\n)|%s)' % YoutubeBaseInfoExtractor._YT_INITIAL_BOUNDARY_RE
_YT_INITIAL_DATA_RE = YoutubeBaseInfoExtractor._YT_INITIAL_DATA_RE
_YT_INITIAL_PLAYER_RESPONSE_RE = fr'''(?x)
(?:window\s*\[\s*["\']ytInitialPlayerResponse["\']\s*\]|ytInitialPlayerResponse)\s*=[(\s]*|
{YoutubeBaseInfoExtractor._YT_INITIAL_PLAYER_RESPONSE_RE}'''
_YT_DEFAULT_THUMB_SERVERS = ['i.ytimg.com'] # thumbnails most likely archived on these servers
_YT_ALL_THUMB_SERVERS = orderedSet(
@ -474,11 +475,6 @@ class YoutubeWebArchiveIE(InfoExtractor):
elif not isinstance(res, list) or len(res) != 0:
self.report_warning('Error while parsing CDX API response' + bug_reports_message())
def _extract_yt_initial_variable(self, webpage, regex, video_id, name):
return self._parse_json(self._search_regex(
(fr'{regex}\s*{self._YT_INITIAL_BOUNDARY_RE}',
regex), webpage, name, default='{}'), video_id, fatal=False)
def _extract_webpage_title(self, webpage):
page_title = self._html_extract_title(webpage, default='')
# YouTube video pages appear to always have either 'YouTube -' as prefix or '- YouTube' as suffix.
@ -488,10 +484,11 @@ class YoutubeWebArchiveIE(InfoExtractor):
def _extract_metadata(self, video_id, webpage):
search_meta = ((lambda x: self._html_search_meta(x, webpage, default=None)) if webpage else (lambda x: None))
player_response = self._extract_yt_initial_variable(
webpage, self._YT_INITIAL_PLAYER_RESPONSE_RE, video_id, 'initial player response') or {}
initial_data = self._extract_yt_initial_variable(
webpage, self._YT_INITIAL_DATA_RE, video_id, 'initial player response') or {}
player_response = self._search_json(
self._YT_INITIAL_PLAYER_RESPONSE_RE, webpage, 'initial player response',
video_id, default={})
initial_data = self._search_json(
self._YT_INITIAL_DATA_RE, webpage, 'initial data', video_id, default={})
initial_data_video = traverse_obj(
initial_data, ('contents', 'twoColumnWatchNextResults', 'results', 'results', 'contents', ..., 'videoPrimaryInfoRenderer'),

View File

@ -90,7 +90,7 @@ class ArnesIE(InfoExtractor):
'timestamp': parse_iso8601(video.get('creationTime')),
'channel': channel.get('name'),
'channel_id': channel_id,
'channel_url': format_field(channel_id, template=f'{self._BASE_URL}/?channel=%s'),
'channel_url': format_field(channel_id, None, f'{self._BASE_URL}/?channel=%s'),
'duration': float_or_none(video.get('duration'), 1000),
'view_count': int_or_none(video.get('views')),
'tags': video.get('hashtags'),

View File

@ -0,0 +1,34 @@
import re
from .common import InfoExtractor
class AtScaleConfEventIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?atscaleconference\.com/events/(?P<id>[^/&$?]+)'
_TESTS = [{
'url': 'https://atscaleconference.com/events/data-scale-spring-2022/',
'playlist_mincount': 13,
'info_dict': {
'id': 'data-scale-spring-2022',
'title': 'Data @Scale Spring 2022',
'description': 'md5:7d7ca1c42ac9c6d8a785092a1aea4b55'
},
}, {
'url': 'https://atscaleconference.com/events/video-scale-2021/',
'playlist_mincount': 14,
'info_dict': {
'id': 'video-scale-2021',
'title': 'Video @Scale 2021',
'description': 'md5:7d7ca1c42ac9c6d8a785092a1aea4b55'
},
}]
def _real_extract(self, url):
id = self._match_id(url)
webpage = self._download_webpage(url, id)
return self.playlist_from_matches(
re.findall(r'data-url\s*=\s*"(https?://(?:www\.)?atscaleconference\.com/videos/[^"]+)"', webpage),
ie='Generic', playlist_id=id,
title=self._og_search_title(webpage), description=self._og_search_description(webpage))

View File

@ -41,7 +41,7 @@ class AWAANBaseIE(InfoExtractor):
'id': video_id,
'title': title,
'description': video_data.get('description_en') or video_data.get('description_ar'),
'thumbnail': format_field(img, template='http://admin.mangomolo.com/analytics/%s'),
'thumbnail': format_field(img, None, 'http://admin.mangomolo.com/analytics/%s'),
'duration': int_or_none(video_data.get('duration')),
'timestamp': parse_iso8601(video_data.get('create_time'), ' '),
'is_live': is_live,

View File

@ -24,7 +24,7 @@ class BellMediaIE(InfoExtractor):
)/.*?(?:\b(?:vid(?:eoid)?|clipId)=|-vid|~|%7E|/(?:episode)?)(?P<id>[0-9]{6,})'''
_TESTS = [{
'url': 'https://www.bnnbloomberg.ca/video/david-cockfield-s-top-picks~1403070',
'md5': '36d3ef559cfe8af8efe15922cd3ce950',
'md5': '3e5b8e38370741d5089da79161646635',
'info_dict': {
'id': '1403070',
'ext': 'flv',
@ -32,6 +32,14 @@ class BellMediaIE(InfoExtractor):
'description': 'md5:810f7f8c6a83ad5b48677c3f8e5bb2c3',
'upload_date': '20180525',
'timestamp': 1527288600,
'season_id': 73997,
'season': '2018',
'thumbnail': 'http://images2.9c9media.com/image_asset/2018_5_25_baf30cbd-b28d-4a18-9903-4bb8713b00f5_PNG_956x536.jpg',
'tags': [],
'categories': ['ETFs'],
'season_number': 8,
'duration': 272.038,
'series': 'Market Call Tonight',
},
}, {
'url': 'http://www.thecomedynetwork.ca/video/player?vid=923582',

View File

@ -677,6 +677,11 @@ class BilibiliAudioIE(BilibiliAudioBaseIE):
'vcodec': 'none'
}]
for a_format in formats:
a_format.setdefault('http_headers', {}).update({
'Referer': url,
})
song = self._call_api('song/info', au_id)
title = song['title']
statistic = song.get('statistic') or {}
@ -784,7 +789,8 @@ class BiliIntlBaseIE(InfoExtractor):
def json2srt(self, json):
data = '\n\n'.join(
f'{i + 1}\n{srt_subtitles_timecode(line["from"])} --> {srt_subtitles_timecode(line["to"])}\n{line["content"]}'
for i, line in enumerate(json['body']) if line.get('content'))
for i, line in enumerate(traverse_obj(json, (
'body', lambda _, l: l['content'] and l['from'] and l['to']))))
return data
def _get_subtitles(self, *, ep_id=None, aid=None):
@ -947,12 +953,11 @@ class BiliIntlIE(BiliIntlBaseIE):
video_id = ep_id or aid
webpage = self._download_webpage(url, video_id)
# Bstation layout
initial_data = self._parse_json(self._search_regex(
r'window\.__INITIAL_(?:DATA|STATE)__\s*=\s*({.+?});', webpage,
'preload state', default='{}'), video_id, fatal=False) or {}
video_data = (
traverse_obj(initial_data, ('OgvVideo', 'epDetail'), expected_type=dict)
or traverse_obj(initial_data, ('UgcVideo', 'videoData'), expected_type=dict) or {})
initial_data = (
self._search_json(r'window\.__INITIAL_(?:DATA|STATE)__\s*=', webpage, 'preload state', video_id, default={})
or self._search_nuxt_data(webpage, video_id, '__initialState', fatal=False, traverse=None))
video_data = traverse_obj(
initial_data, ('OgvVideo', 'epDetail'), ('UgcVideo', 'videoData'), ('ugc', 'archive'), expected_type=dict)
if season_id and not video_data:
# Non-Bstation layout, read through episode list
@ -960,7 +965,7 @@ class BiliIntlIE(BiliIntlBaseIE):
video_data = traverse_obj(season_json,
('sections', ..., 'episodes', lambda _, v: str(v['episode_id']) == ep_id),
expected_type=dict, get_all=False)
return self._extract_video_info(video_data, ep_id=ep_id, aid=aid)
return self._extract_video_info(video_data or {}, ep_id=ep_id, aid=aid)
class BiliIntlSeriesIE(BiliIntlBaseIE):

View File

@ -7,13 +7,11 @@ class BloombergIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?bloomberg\.com/(?:[^/]+/)*(?P<id>[^/?#]+)'
_TESTS = [{
'url': 'http://www.bloomberg.com/news/videos/b/aaeae121-5949-481e-a1ce-4562db6f5df2',
# The md5 checksum changes
'url': 'https://www.bloomberg.com/news/videos/2021-09-14/apple-unveils-the-new-iphone-13-stock-doesn-t-move-much-video',
'info_dict': {
'id': 'qurhIVlJSB6hzkVi229d8g',
'id': 'V8cFcYMxTHaMcEiiYVr39A',
'ext': 'flv',
'title': 'Shah\'s Presentation on Foreign-Exchange Strategies',
'description': 'md5:a8ba0302912d03d246979735c17d2761',
'title': 'Apple Unveils the New IPhone 13, Stock Doesn\'t Move Much',
},
'params': {
'format': 'best[format_id^=hds]',
@ -57,7 +55,7 @@ class BloombergIE(InfoExtractor):
title = re.sub(': Video$', '', self._og_search_title(webpage))
embed_info = self._download_json(
'http://www.bloomberg.com/api/embed?id=%s' % video_id, video_id)
'http://www.bloomberg.com/multimedia/api/embed?id=%s' % video_id, video_id)
formats = []
for stream in embed_info['streams']:
stream_url = stream.get('url')

View File

@ -75,6 +75,7 @@ class CCCIE(InfoExtractor):
'thumbnail': event_data.get('thumb_url'),
'timestamp': parse_iso8601(event_data.get('date')),
'duration': int_or_none(event_data.get('length')),
'view_count': int_or_none(event_data.get('view_count')),
'tags': event_data.get('tags'),
'formats': formats,
}

View File

@ -11,6 +11,7 @@ import sys
import time
import xml.etree.ElementTree
from ..compat import functools, re # isort: split
from ..compat import (
compat_cookiejar_Cookie,
compat_cookies_SimpleCookie,
@ -25,7 +26,6 @@ from ..compat import (
compat_urllib_parse_urlencode,
compat_urllib_request,
compat_urlparse,
re,
)
from ..downloader import FileDownloader
from ..downloader.f4m import get_base_url, remove_encrypted_media
@ -35,6 +35,7 @@ from ..utils import (
ExtractorError,
GeoRestrictedError,
GeoUtils,
LenientJSONDecoder,
RegexNotFoundError,
UnsupportedError,
age_restricted,
@ -384,6 +385,11 @@ class InfoExtractor:
release_year: Year (YYYY) when the album was released.
composer: Composer of the piece
The following fields should only be set for clips that should be cut from the original video:
section_start: Start time of the section in seconds
section_end: End time of the section in seconds
Unless mentioned otherwise, the fields should be Unicode strings.
Unless mentioned otherwise, None is equivalent to absence of information.
@ -610,8 +616,7 @@ class InfoExtractor:
if ip_block:
self._x_forwarded_for_ip = GeoUtils.random_ipv4(ip_block)
self._downloader.write_debug(
'[debug] Using fake IP %s as X-Forwarded-For' % self._x_forwarded_for_ip)
self.write_debug(f'Using fake IP {self._x_forwarded_for_ip} as X-Forwarded-For')
return
# Path 2: bypassing based on country code
@ -725,6 +730,13 @@ class InfoExtractor:
else:
return err.code in variadic(expected_status)
def _create_request(self, url_or_request, data=None, headers={}, query={}):
if isinstance(url_or_request, compat_urllib_request.Request):
return update_Request(url_or_request, data=data, headers=headers, query=query)
if query:
url_or_request = update_url_query(url_or_request, query)
return sanitized_Request(url_or_request, data, headers)
def _request_webpage(self, url_or_request, video_id, note=None, errnote=None, fatal=True, data=None, headers={}, query={}, expected_status=None):
"""
Return the response handle.
@ -756,16 +768,8 @@ class InfoExtractor:
if 'X-Forwarded-For' not in headers:
headers['X-Forwarded-For'] = self._x_forwarded_for_ip
if isinstance(url_or_request, compat_urllib_request.Request):
url_or_request = update_Request(
url_or_request, data=data, headers=headers, query=query)
else:
if query:
url_or_request = update_url_query(url_or_request, query)
if data is not None or headers:
url_or_request = sanitized_Request(url_or_request, data, headers)
try:
return self._downloader.urlopen(url_or_request)
return self._downloader.urlopen(self._create_request(url_or_request, data, headers, query))
except network_exceptions as err:
if isinstance(err, compat_urllib_error.HTTPError):
if self.__can_accept_status_code(err, expected_status):
@ -788,12 +792,40 @@ class InfoExtractor:
self.report_warning(errmsg)
return False
def _download_webpage_handle(self, url_or_request, video_id, note=None, errnote=None, fatal=True, encoding=None, data=None, headers={}, query={}, expected_status=None):
def _download_webpage_handle(self, url_or_request, video_id, note=None, errnote=None, fatal=True,
encoding=None, data=None, headers={}, query={}, expected_status=None):
"""
Return a tuple (page content as string, URL handle).
See _download_webpage docstring for arguments specification.
Arguments:
url_or_request -- plain text URL as a string or
a compat_urllib_request.Requestobject
video_id -- Video/playlist/item identifier (string)
Keyword arguments:
note -- note printed before downloading (string)
errnote -- note printed in case of an error (string)
fatal -- flag denoting whether error should be considered fatal,
i.e. whether it should cause ExtractionError to be raised,
otherwise a warning will be reported and extraction continued
encoding -- encoding for a page content decoding, guessed automatically
when not explicitly specified
data -- POST data (bytes)
headers -- HTTP headers (dict)
query -- URL query (dict)
expected_status -- allows to accept failed HTTP requests (non 2xx
status code) by explicitly specifying a set of accepted status
codes. Can be any of the following entities:
- an integer type specifying an exact failed status code to
accept
- a list or a tuple of integer types specifying a list of
failed status codes to accept
- a callable accepting an actual failed status code and
returning True if it should be accepted
Note that this argument does not affect success status codes (2xx)
which are always accepted.
"""
# Strip hashes from the URL (#1038)
if isinstance(url_or_request, (compat_str, str)):
url_or_request = url_or_request.partition('#')[0]
@ -850,140 +882,48 @@ class InfoExtractor:
'Visit http://blocklist.rkn.gov.ru/ for a block reason.',
expected=True)
def _webpage_read_content(self, urlh, url_or_request, video_id, note=None, errnote=None, fatal=True, prefix=None, encoding=None):
content_type = urlh.headers.get('Content-Type', '')
webpage_bytes = urlh.read()
if prefix is not None:
webpage_bytes = prefix + webpage_bytes
if not encoding:
encoding = self._guess_encoding_from_content(content_type, webpage_bytes)
if self.get_param('dump_intermediate_pages', False):
self.to_screen('Dumping request to ' + urlh.geturl())
dump = base64.b64encode(webpage_bytes).decode('ascii')
self._downloader.to_screen(dump)
if self.get_param('write_pages', False):
basen = f'{video_id}_{urlh.geturl()}'
def _request_dump_filename(self, url, video_id):
basen = f'{video_id}_{url}'
trim_length = self.get_param('trim_file_name') or 240
if len(basen) > trim_length:
h = '___' + hashlib.md5(basen.encode('utf-8')).hexdigest()
basen = basen[:trim_length - len(h)] + h
raw_filename = basen + '.dump'
filename = sanitize_filename(raw_filename, restricted=True)
self.to_screen('Saving request to ' + filename)
filename = sanitize_filename(f'{basen}.dump', restricted=True)
# Working around MAX_PATH limitation on Windows (see
# http://msdn.microsoft.com/en-us/library/windows/desktop/aa365247(v=vs.85).aspx)
if compat_os_name == 'nt':
absfilepath = os.path.abspath(filename)
if len(absfilepath) > 259:
filename = '\\\\?\\' + absfilepath
filename = fR'\\?\{absfilepath}'
return filename
def __decode_webpage(self, webpage_bytes, encoding, headers):
if not encoding:
encoding = self._guess_encoding_from_content(headers.get('Content-Type', ''), webpage_bytes)
try:
return webpage_bytes.decode(encoding, 'replace')
except LookupError:
return webpage_bytes.decode('utf-8', 'replace')
def _webpage_read_content(self, urlh, url_or_request, video_id, note=None, errnote=None, fatal=True, prefix=None, encoding=None):
webpage_bytes = urlh.read()
if prefix is not None:
webpage_bytes = prefix + webpage_bytes
if self.get_param('dump_intermediate_pages', False):
self.to_screen('Dumping request to ' + urlh.geturl())
dump = base64.b64encode(webpage_bytes).decode('ascii')
self._downloader.to_screen(dump)
if self.get_param('write_pages'):
filename = self._request_dump_filename(urlh.geturl(), video_id)
self.to_screen(f'Saving request to {filename}')
with open(filename, 'wb') as outf:
outf.write(webpage_bytes)
try:
content = webpage_bytes.decode(encoding, 'replace')
except LookupError:
content = webpage_bytes.decode('utf-8', 'replace')
content = self.__decode_webpage(webpage_bytes, encoding, urlh.headers)
self.__check_blocked(content)
return content
def _download_webpage(
self, url_or_request, video_id, note=None, errnote=None,
fatal=True, tries=1, timeout=5, encoding=None, data=None,
headers={}, query={}, expected_status=None):
"""
Return the data of the page as a string.
Arguments:
url_or_request -- plain text URL as a string or
a compat_urllib_request.Requestobject
video_id -- Video/playlist/item identifier (string)
Keyword arguments:
note -- note printed before downloading (string)
errnote -- note printed in case of an error (string)
fatal -- flag denoting whether error should be considered fatal,
i.e. whether it should cause ExtractionError to be raised,
otherwise a warning will be reported and extraction continued
tries -- number of tries
timeout -- sleep interval between tries
encoding -- encoding for a page content decoding, guessed automatically
when not explicitly specified
data -- POST data (bytes)
headers -- HTTP headers (dict)
query -- URL query (dict)
expected_status -- allows to accept failed HTTP requests (non 2xx
status code) by explicitly specifying a set of accepted status
codes. Can be any of the following entities:
- an integer type specifying an exact failed status code to
accept
- a list or a tuple of integer types specifying a list of
failed status codes to accept
- a callable accepting an actual failed status code and
returning True if it should be accepted
Note that this argument does not affect success status codes (2xx)
which are always accepted.
"""
success = False
try_count = 0
while success is False:
try:
res = self._download_webpage_handle(
url_or_request, video_id, note, errnote, fatal,
encoding=encoding, data=data, headers=headers, query=query,
expected_status=expected_status)
success = True
except compat_http_client.IncompleteRead as e:
try_count += 1
if try_count >= tries:
raise e
self._sleep(timeout, video_id)
if res is False:
return res
else:
content, _ = res
return content
def _download_xml_handle(
self, url_or_request, video_id, note='Downloading XML',
errnote='Unable to download XML', transform_source=None,
fatal=True, encoding=None, data=None, headers={}, query={},
expected_status=None):
"""
Return a tuple (xml as an xml.etree.ElementTree.Element, URL handle).
See _download_webpage docstring for arguments specification.
"""
res = self._download_webpage_handle(
url_or_request, video_id, note, errnote, fatal=fatal,
encoding=encoding, data=data, headers=headers, query=query,
expected_status=expected_status)
if res is False:
return res
xml_string, urlh = res
return self._parse_xml(
xml_string, video_id, transform_source=transform_source,
fatal=fatal), urlh
def _download_xml(
self, url_or_request, video_id,
note='Downloading XML', errnote='Unable to download XML',
transform_source=None, fatal=True, encoding=None,
data=None, headers={}, query={}, expected_status=None):
"""
Return the xml as an xml.etree.ElementTree.Element.
See _download_webpage docstring for arguments specification.
"""
res = self._download_xml_handle(
url_or_request, video_id, note=note, errnote=errnote,
transform_source=transform_source, fatal=fatal, encoding=encoding,
data=data, headers=headers, query=query,
expected_status=expected_status)
return res if res is False else res[0]
def _parse_xml(self, xml_string, video_id, transform_source=None, fatal=True):
if transform_source:
xml_string = transform_source(xml_string)
@ -996,101 +936,126 @@ class InfoExtractor:
else:
self.report_warning(errmsg + str(ve))
def _download_json_handle(
self, url_or_request, video_id, note='Downloading JSON metadata',
errnote='Unable to download JSON metadata', transform_source=None,
fatal=True, encoding=None, data=None, headers={}, query={},
expected_status=None):
"""
Return a tuple (JSON object, URL handle).
See _download_webpage docstring for arguments specification.
"""
res = self._download_webpage_handle(
url_or_request, video_id, note, errnote, fatal=fatal,
encoding=encoding, data=data, headers=headers, query=query,
expected_status=expected_status)
if res is False:
return res
json_string, urlh = res
return self._parse_json(
json_string, video_id, transform_source=transform_source,
fatal=fatal), urlh
def _download_json(
self, url_or_request, video_id, note='Downloading JSON metadata',
errnote='Unable to download JSON metadata', transform_source=None,
fatal=True, encoding=None, data=None, headers={}, query={},
expected_status=None):
"""
Return the JSON object as a dict.
See _download_webpage docstring for arguments specification.
"""
res = self._download_json_handle(
url_or_request, video_id, note=note, errnote=errnote,
transform_source=transform_source, fatal=fatal, encoding=encoding,
data=data, headers=headers, query=query,
expected_status=expected_status)
return res if res is False else res[0]
def _parse_json(self, json_string, video_id, transform_source=None, fatal=True):
if transform_source:
json_string = transform_source(json_string)
def _parse_json(self, json_string, video_id, transform_source=None, fatal=True, **parser_kwargs):
try:
return json.loads(json_string, strict=False)
return json.loads(
json_string, cls=LenientJSONDecoder, strict=False, transform_source=transform_source, **parser_kwargs)
except ValueError as ve:
errmsg = '%s: Failed to parse JSON ' % video_id
errmsg = f'{video_id}: Failed to parse JSON'
if fatal:
raise ExtractorError(errmsg, cause=ve)
else:
self.report_warning(errmsg + str(ve))
self.report_warning(f'{errmsg}: {ve}')
def _parse_socket_response_as_json(self, data, video_id, transform_source=None, fatal=True):
return self._parse_json(
data[data.find('{'):data.rfind('}') + 1],
video_id, transform_source, fatal)
def _download_socket_json_handle(
self, url_or_request, video_id, note='Polling socket',
errnote='Unable to poll socket', transform_source=None,
fatal=True, encoding=None, data=None, headers={}, query={},
expected_status=None):
"""
Return a tuple (JSON object, URL handle).
def __create_download_methods(name, parser, note, errnote, return_value):
See _download_webpage docstring for arguments specification.
"""
def parse(ie, content, *args, **kwargs):
if parser is None:
return content
# parser is fetched by name so subclasses can override it
return getattr(ie, parser)(content, *args, **kwargs)
def download_handle(self, url_or_request, video_id, note=note, errnote=errnote, transform_source=None,
fatal=True, encoding=None, data=None, headers={}, query={}, expected_status=None):
res = self._download_webpage_handle(
url_or_request, video_id, note, errnote, fatal=fatal,
encoding=encoding, data=data, headers=headers, query=query,
expected_status=expected_status)
url_or_request, video_id, note=note, errnote=errnote, fatal=fatal, encoding=encoding,
data=data, headers=headers, query=query, expected_status=expected_status)
if res is False:
return res
webpage, urlh = res
return self._parse_socket_response_as_json(
webpage, video_id, transform_source=transform_source,
fatal=fatal), urlh
content, urlh = res
return parse(self, content, video_id, transform_source=transform_source, fatal=fatal), urlh
def _download_socket_json(
self, url_or_request, video_id, note='Polling socket',
errnote='Unable to poll socket', transform_source=None,
fatal=True, encoding=None, data=None, headers={}, query={},
expected_status=None):
"""
Return the JSON object as a dict.
See _download_webpage docstring for arguments specification.
"""
res = self._download_socket_json_handle(
url_or_request, video_id, note=note, errnote=errnote,
transform_source=transform_source, fatal=fatal, encoding=encoding,
data=data, headers=headers, query=query,
expected_status=expected_status)
def download_content(self, url_or_request, video_id, note=note, errnote=errnote, transform_source=None,
fatal=True, encoding=None, data=None, headers={}, query={}, expected_status=None):
if self.get_param('load_pages'):
url_or_request = self._create_request(url_or_request, data, headers, query)
filename = self._request_dump_filename(url_or_request.full_url, video_id)
self.to_screen(f'Loading request from {filename}')
try:
with open(filename, 'rb') as dumpf:
webpage_bytes = dumpf.read()
except OSError as e:
self.report_warning(f'Unable to load request from disk: {e}')
else:
content = self.__decode_webpage(webpage_bytes, encoding, url_or_request.headers)
return parse(self, content, video_id, transform_source, fatal)
kwargs = {
'note': note,
'errnote': errnote,
'transform_source': transform_source,
'fatal': fatal,
'encoding': encoding,
'data': data,
'headers': headers,
'query': query,
'expected_status': expected_status,
}
if parser is None:
kwargs.pop('transform_source')
# The method is fetched by name so subclasses can override _download_..._handle
res = getattr(self, download_handle.__name__)(url_or_request, video_id, **kwargs)
return res if res is False else res[0]
def impersonate(func, name, return_value):
func.__name__, func.__qualname__ = name, f'InfoExtractor.{name}'
func.__doc__ = f'''
@param transform_source Apply this transformation before parsing
@returns {return_value}
See _download_webpage_handle docstring for other arguments specification
'''
impersonate(download_handle, f'_download_{name}_handle', f'({return_value}, URL handle)')
impersonate(download_content, f'_download_{name}', f'{return_value}')
return download_handle, download_content
_download_xml_handle, _download_xml = __create_download_methods(
'xml', '_parse_xml', 'Downloading XML', 'Unable to download XML', 'xml as an xml.etree.ElementTree.Element')
_download_json_handle, _download_json = __create_download_methods(
'json', '_parse_json', 'Downloading JSON metadata', 'Unable to download JSON metadata', 'JSON object as a dict')
_download_socket_json_handle, _download_socket_json = __create_download_methods(
'socket_json', '_parse_socket_response_as_json', 'Polling socket', 'Unable to poll socket', 'JSON object as a dict')
__download_webpage = __create_download_methods('webpage', None, None, None, 'data of the page as a string')[1]
def _download_webpage(
self, url_or_request, video_id, note=None, errnote=None,
fatal=True, tries=1, timeout=NO_DEFAULT, *args, **kwargs):
"""
Return the data of the page as a string.
Keyword arguments:
tries -- number of tries
timeout -- sleep interval between tries
See _download_webpage_handle docstring for other arguments specification.
"""
R''' # NB: These are unused; should they be deprecated?
if tries != 1:
self._downloader.deprecation_warning('tries argument is deprecated in InfoExtractor._download_webpage')
if timeout is NO_DEFAULT:
timeout = 5
else:
self._downloader.deprecation_warning('timeout argument is deprecated in InfoExtractor._download_webpage')
'''
try_count = 0
while True:
try:
return self.__download_webpage(url_or_request, video_id, note, errnote, None, fatal, *args, **kwargs)
except compat_http_client.IncompleteRead as e:
try_count += 1
if try_count >= tries:
raise e
self._sleep(timeout, video_id)
def report_warning(self, msg, video_id=None, *args, only_once=False, **kwargs):
idstr = format_field(video_id, template='%s: ')
idstr = format_field(video_id, None, '%s: ')
msg = f'[{self.IE_NAME}] {idstr}{msg}'
if only_once:
if f'WARNING: {msg}' in self._printed_messages:
@ -1136,7 +1101,7 @@ class InfoExtractor:
self.get_param('ignore_no_formats_error') or self.get_param('wait_for_video')):
self.report_warning(msg)
return
msg += format_field(self._login_hint(method), template='. %s')
msg += format_field(self._login_hint(method), None, '. %s')
raise ExtractorError(msg, expected=True)
def raise_geo_restricted(
@ -1228,6 +1193,33 @@ class InfoExtractor:
self.report_warning('unable to extract %s' % _name + bug_reports_message())
return None
def _search_json(self, start_pattern, string, name, video_id, *, end_pattern='',
contains_pattern='(?s:.+)', fatal=True, default=NO_DEFAULT, **kwargs):
"""Searches string for the JSON object specified by start_pattern"""
# NB: end_pattern is only used to reduce the size of the initial match
if default is NO_DEFAULT:
default, has_default = {}, False
else:
fatal, has_default = False, True
json_string = self._search_regex(
rf'{start_pattern}\s*(?P<json>{{\s*{contains_pattern}\s*}})\s*{end_pattern}',
string, name, group='json', fatal=fatal, default=None if has_default else NO_DEFAULT)
if not json_string:
return default
_name = self._downloader._format_err(name, self._downloader.Styles.EMPHASIS)
try:
return self._parse_json(json_string, video_id, ignore_extra=True, **kwargs)
except ExtractorError as e:
if fatal:
raise ExtractorError(
f'Unable to extract {_name} - Failed to parse JSON', cause=e.cause, video_id=video_id)
elif not has_default:
self.report_warning(
f'Unable to extract {_name} - Failed to parse JSON: {e}', video_id=video_id)
return default
def _html_search_regex(self, pattern, string, name, default=NO_DEFAULT, fatal=True, flags=0, group=None):
"""
Like _search_regex, but strips HTML tags and unescapes entities.
@ -1451,6 +1443,10 @@ class InfoExtractor:
'ViewAction': 'view',
}
def is_type(e, *expected_types):
type = variadic(traverse_obj(e, '@type'))
return any(x in type for x in expected_types)
def extract_interaction_type(e):
interaction_type = e.get('interactionType')
if isinstance(interaction_type, dict):
@ -1464,9 +1460,7 @@ class InfoExtractor:
if not isinstance(interaction_statistic, list):
return
for is_e in interaction_statistic:
if not isinstance(is_e, dict):
continue
if is_e.get('@type') != 'InteractionCounter':
if not is_type(is_e, 'InteractionCounter'):
continue
interaction_type = extract_interaction_type(is_e)
if not interaction_type:
@ -1503,10 +1497,10 @@ class InfoExtractor:
info['chapters'] = chapters
def extract_video_object(e):
assert e['@type'] == 'VideoObject'
assert is_type(e, 'VideoObject')
author = e.get('author')
info.update({
'url': url_or_none(e.get('contentUrl')),
'url': traverse_obj(e, 'contentUrl', 'embedUrl', expected_type=url_or_none),
'title': unescapeHTML(e.get('name')),
'description': unescapeHTML(e.get('description')),
'thumbnails': [{'url': url}
@ -1519,7 +1513,7 @@ class InfoExtractor:
# however some websites are using 'Text' type instead.
# 1. https://schema.org/VideoObject
'uploader': author.get('name') if isinstance(author, dict) else author if isinstance(author, compat_str) else None,
'filesize': float_or_none(e.get('contentSize')),
'filesize': int_or_none(float_or_none(e.get('contentSize'))),
'tbr': int_or_none(e.get('bitrate')),
'width': int_or_none(e.get('width')),
'height': int_or_none(e.get('height')),
@ -1535,13 +1529,12 @@ class InfoExtractor:
if at_top_level and set(e.keys()) == {'@context', '@graph'}:
traverse_json_ld(variadic(e['@graph'], allowed_types=(dict,)), at_top_level=False)
break
item_type = e.get('@type')
if expected_type is not None and expected_type != item_type:
if expected_type is not None and not is_type(e, expected_type):
continue
rating = traverse_obj(e, ('aggregateRating', 'ratingValue'), expected_type=float_or_none)
if rating is not None:
info['average_rating'] = rating
if item_type in ('TVEpisode', 'Episode'):
if is_type(e, 'TVEpisode', 'Episode'):
episode_name = unescapeHTML(e.get('name'))
info.update({
'episode': episode_name,
@ -1551,37 +1544,39 @@ class InfoExtractor:
if not info.get('title') and episode_name:
info['title'] = episode_name
part_of_season = e.get('partOfSeason')
if isinstance(part_of_season, dict) and part_of_season.get('@type') in ('TVSeason', 'Season', 'CreativeWorkSeason'):
if is_type(part_of_season, 'TVSeason', 'Season', 'CreativeWorkSeason'):
info.update({
'season': unescapeHTML(part_of_season.get('name')),
'season_number': int_or_none(part_of_season.get('seasonNumber')),
})
part_of_series = e.get('partOfSeries') or e.get('partOfTVSeries')
if isinstance(part_of_series, dict) and part_of_series.get('@type') in ('TVSeries', 'Series', 'CreativeWorkSeries'):
if is_type(part_of_series, 'TVSeries', 'Series', 'CreativeWorkSeries'):
info['series'] = unescapeHTML(part_of_series.get('name'))
elif item_type == 'Movie':
elif is_type(e, 'Movie'):
info.update({
'title': unescapeHTML(e.get('name')),
'description': unescapeHTML(e.get('description')),
'duration': parse_duration(e.get('duration')),
'timestamp': unified_timestamp(e.get('dateCreated')),
})
elif item_type in ('Article', 'NewsArticle'):
elif is_type(e, 'Article', 'NewsArticle'):
info.update({
'timestamp': parse_iso8601(e.get('datePublished')),
'title': unescapeHTML(e.get('headline')),
'description': unescapeHTML(e.get('articleBody') or e.get('description')),
})
if traverse_obj(e, ('video', 0, '@type')) == 'VideoObject':
if is_type(traverse_obj(e, ('video', 0)), 'VideoObject'):
extract_video_object(e['video'][0])
elif item_type == 'VideoObject':
elif is_type(traverse_obj(e, ('subjectOf', 0)), 'VideoObject'):
extract_video_object(e['subjectOf'][0])
elif is_type(e, 'VideoObject'):
extract_video_object(e)
if expected_type is None:
continue
else:
break
video = e.get('video')
if isinstance(video, dict) and video.get('@type') == 'VideoObject':
if is_type(video, 'VideoObject'):
extract_video_object(video)
if expected_type is None:
continue
@ -1598,15 +1593,13 @@ class InfoExtractor:
webpage, 'next.js data', fatal=fatal, **kw),
video_id, transform_source=transform_source, fatal=fatal)
def _search_nuxt_data(self, webpage, video_id, context_name='__NUXT__'):
''' Parses Nuxt.js metadata. This works as long as the function __NUXT__ invokes is a pure function. '''
# not all website do this, but it can be changed
# https://stackoverflow.com/questions/67463109/how-to-change-or-hide-nuxt-and-nuxt-keyword-in-page-source
def _search_nuxt_data(self, webpage, video_id, context_name='__NUXT__', *, fatal=True, traverse=('data', 0)):
"""Parses Nuxt.js metadata. This works as long as the function __NUXT__ invokes is a pure function"""
rectx = re.escape(context_name)
FUNCTION_RE = r'\(function\((?P<arg_keys>.*?)\){return\s+(?P<js>{.*?})\s*;?\s*}\((?P<arg_vals>.*?)\)'
js, arg_keys, arg_vals = self._search_regex(
(r'<script>window\.%s=\(function\((?P<arg_keys>.*?)\)\{return\s(?P<js>\{.*?\})\}\((?P<arg_vals>.+?)\)\);?</script>' % rectx,
r'%s\(.*?\(function\((?P<arg_keys>.*?)\)\{return\s(?P<js>\{.*?\})\}\((?P<arg_vals>.*?)\)' % rectx),
webpage, context_name, group=['js', 'arg_keys', 'arg_vals'])
(rf'<script>\s*window\.{rectx}={FUNCTION_RE}\s*\)\s*;?\s*</script>', rf'{rectx}\(.*?{FUNCTION_RE}'),
webpage, context_name, group=('js', 'arg_keys', 'arg_vals'), fatal=fatal)
args = dict(zip(arg_keys.split(','), arg_vals.split(',')))
@ -1614,7 +1607,8 @@ class InfoExtractor:
if val in ('undefined', 'void 0'):
args[key] = 'null'
return self._parse_json(js_to_json(js, args), video_id)['data'][0]
ret = self._parse_json(js, video_id, transform_source=functools.partial(js_to_json, vars=args), fatal=fatal)
return traverse_obj(ret, traverse) or {}
@staticmethod
def _hidden_inputs(html):
@ -3190,7 +3184,8 @@ class InfoExtractor:
return f
return {}
def _media_formats(src, cur_media_type, type_info={}):
def _media_formats(src, cur_media_type, type_info=None):
type_info = type_info or {}
full_url = absolute_url(src)
ext = type_info.get('ext') or determine_ext(full_url)
if ext == 'm3u8':
@ -3208,6 +3203,7 @@ class InfoExtractor:
formats = [{
'url': full_url,
'vcodec': 'none' if cur_media_type == 'audio' else None,
'ext': ext,
}]
return is_plain_url, formats
@ -3234,7 +3230,8 @@ class InfoExtractor:
media_attributes = extract_attributes(media_tag)
src = strip_or_none(media_attributes.get('src'))
if src:
_, formats = _media_formats(src, media_type)
f = parse_content_type(media_attributes.get('type'))
_, formats = _media_formats(src, media_type, f)
media_info['formats'].extend(formats)
media_info['thumbnail'] = absolute_url(media_attributes.get('poster'))
if media_content:
@ -3602,9 +3599,7 @@ class InfoExtractor:
def _get_cookies(self, url):
""" Return a compat_cookies_SimpleCookie with the cookies for the url """
req = sanitized_Request(url)
self._downloader.cookiejar.add_cookie_header(req)
return compat_cookies_SimpleCookie(req.get_header('Cookie'))
return compat_cookies_SimpleCookie(self._downloader._calc_cookies(url))
def _apply_first_set_cookie_header(self, url_handle, cookie):
"""
@ -3748,7 +3743,7 @@ class InfoExtractor:
def _get_automatic_captions(self, *args, **kwargs):
raise NotImplementedError('This method must be implemented by subclasses')
@property
@functools.cached_property
def _cookies_passed(self):
"""Whether cookies have been passed to YoutubeDL"""
return self.get_param('cookiefile') is not None or self.get_param('cookiesfrombrowser') is not None

View File

@ -728,11 +728,12 @@ class CrunchyrollBetaBaseIE(CrunchyrollBaseIE):
headers={
'Authorization': auth_response['token_type'] + ' ' + auth_response['access_token']
})
bucket = policy_response['cms']['bucket']
cms = traverse_obj(policy_response, 'cms_beta', 'cms')
bucket = cms['bucket']
params = {
'Policy': policy_response['cms']['policy'],
'Signature': policy_response['cms']['signature'],
'Key-Pair-Id': policy_response['cms']['key_pair_id']
'Policy': cms['policy'],
'Signature': cms['signature'],
'Key-Pair-Id': cms['key_pair_id']
}
locale = traverse_obj(initial_state, ('localization', 'locale'))
if locale:

View File

@ -23,6 +23,11 @@ class CuriosityStreamBaseIE(InfoExtractor):
def _call_api(self, path, video_id, query=None):
headers = {}
if not self._auth_token:
auth_cookie = self._get_cookies('https://curiositystream.com').get('auth_token')
if auth_cookie:
self.write_debug('Obtained auth_token cookie')
self._auth_token = auth_cookie.value
if self._auth_token:
headers['X-Auth-Token'] = self._auth_token
result = self._download_json(

View File

@ -5,13 +5,15 @@ import re
from .common import InfoExtractor
from ..compat import compat_HTTPError
from ..utils import (
ExtractorError,
OnDemandPagedList,
age_restricted,
clean_html,
ExtractorError,
int_or_none,
OnDemandPagedList,
traverse_obj,
try_get,
unescapeHTML,
unsmuggle_url,
urlencode_postdata,
)
@ -220,6 +222,7 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
return urls
def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url)
video_id, playlist_id = self._match_valid_url(url).groups()
if playlist_id:
@ -252,7 +255,7 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
metadata = self._download_json(
'https://www.dailymotion.com/player/metadata/video/' + xid,
xid, 'Downloading metadata JSON',
query={'app': 'com.dailymotion.neon'})
query=traverse_obj(smuggled_data, 'query') or {'app': 'com.dailymotion.neon'})
error = metadata.get('error')
if error:

View File

@ -0,0 +1,114 @@
from .common import InfoExtractor
from ..utils import (
determine_ext,
float_or_none,
join_nonempty,
traverse_obj,
url_or_none,
)
class DailyWireBaseIE(InfoExtractor):
_JSON_PATH = {
'episode': ('props', 'pageProps', 'episodeData', 'episode'),
'videos': ('props', 'pageProps', 'videoData', 'video'),
'podcasts': ('props', 'pageProps', 'episode'),
}
def _get_json(self, url):
sites_type, slug = self._match_valid_url(url).group('sites_type', 'id')
json_data = self._search_nextjs_data(self._download_webpage(url, slug), slug)
return slug, traverse_obj(json_data, self._JSON_PATH[sites_type])
class DailyWireIE(DailyWireBaseIE):
_VALID_URL = r'https?://(?:www\.)dailywire(?:\.com)/(?P<sites_type>episode|videos)/(?P<id>[\w-]+)'
_TESTS = [{
'url': 'https://www.dailywire.com/episode/1-fauci',
'info_dict': {
'id': 'ckzsl50xnqpy30850in3v4bu7',
'ext': 'mp4',
'display_id': '1-fauci',
'title': '1. Fauci',
'description': 'md5:9df630347ef85081b7e97dd30bc22853',
'thumbnail': 'https://daily-wire-production.imgix.net/episodes/ckzsl50xnqpy30850in3v4bu7/ckzsl50xnqpy30850in3v4bu7-1648237399554.jpg',
'creator': 'Caroline Roberts',
'series_id': 'ckzplm0a097fn0826r2vc3j7h',
'series': 'China: The Enemy Within',
}
}, {
'url': 'https://www.dailywire.com/episode/ep-124-bill-maher',
'info_dict': {
'id': 'cl0ngbaalplc80894sfdo9edf',
'ext': 'mp3',
'display_id': 'ep-124-bill-maher',
'title': 'Ep. 124 - Bill Maher',
'thumbnail': 'https://daily-wire-production.imgix.net/episodes/cl0ngbaalplc80894sfdo9edf/cl0ngbaalplc80894sfdo9edf-1647065568518.jpg',
'creator': 'Caroline Roberts',
'description': 'md5:adb0de584bcfa9c41374999d9e324e98',
'series_id': 'cjzvep7270hp00786l9hwccob',
'series': 'The Sunday Special',
}
}, {
'url': 'https://www.dailywire.com/videos/the-hyperions',
'only_matching': True,
}]
def _real_extract(self, url):
slug, episode_info = self._get_json(url)
urls = traverse_obj(
episode_info, (('segments', 'videoUrl'), ..., ('video', 'audio')), expected_type=url_or_none)
formats, subtitles = [], {}
for url in urls:
if determine_ext(url) != 'm3u8':
formats.append({'url': url})
continue
format_, subs_ = self._extract_m3u8_formats_and_subtitles(url, slug)
formats.extend(format_)
self._merge_subtitles(subs_, target=subtitles)
self._sort_formats(formats)
return {
'id': episode_info['id'],
'display_id': slug,
'title': traverse_obj(episode_info, 'title', 'name'),
'description': episode_info.get('description'),
'creator': join_nonempty(('createdBy', 'firstName'), ('createdBy', 'lastName'), from_dict=episode_info, delim=' '),
'duration': float_or_none(episode_info.get('duration')),
'is_live': episode_info.get('isLive'),
'thumbnail': traverse_obj(episode_info, 'thumbnail', 'image', expected_type=url_or_none),
'formats': formats,
'subtitles': subtitles,
'series_id': traverse_obj(episode_info, ('show', 'id')),
'series': traverse_obj(episode_info, ('show', 'name')),
}
class DailyWirePodcastIE(DailyWireBaseIE):
_VALID_URL = r'https?://(?:www\.)dailywire(?:\.com)/(?P<sites_type>podcasts)/(?P<podcaster>[\w-]+/(?P<id>[\w-]+))'
_TESTS = [{
'url': 'https://www.dailywire.com/podcasts/morning-wire/get-ready-for-recession-6-15-22',
'info_dict': {
'id': 'cl4f01d0w8pbe0a98ydd0cfn1',
'ext': 'm4a',
'display_id': 'get-ready-for-recession-6-15-22',
'title': 'Get Ready for Recession | 6.15.22',
'description': 'md5:c4afbadda4e1c38a4496f6d62be55634',
'thumbnail': 'https://daily-wire-production.imgix.net/podcasts/ckx4otgd71jm508699tzb6hf4-1639506575562.jpg',
'duration': 900.117667,
}
}]
def _real_extract(self, url):
slug, episode_info = self._get_json(url)
audio_id = traverse_obj(episode_info, 'audioMuxPlaybackId', 'VUsAipTrBVSgzw73SpC2DAJD401TYYwEp')
return {
'id': episode_info['id'],
'url': f'https://stream.media.dailywire.com/{audio_id}/audio.m4a',
'display_id': slug,
'title': episode_info.get('title'),
'duration': float_or_none(episode_info.get('duration')),
'thumbnail': episode_info.get('thumbnail'),
'description': episode_info.get('description'),
}

View File

@ -86,7 +86,7 @@ class DigitalConcertHallIE(InfoExtractor):
})
m3u8_url = traverse_obj(
stream_info, ('channel', lambda x: x.startswith('vod_mixed'), 'stream', 0, 'url'), get_all=False)
stream_info, ('channel', lambda k, _: k.startswith('vod_mixed'), 'stream', 0, 'url'), get_all=False)
formats = self._extract_m3u8_formats(m3u8_url, video_id, 'mp4', 'm3u8_native', fatal=False)
self._sort_formats(formats)

View File

@ -53,8 +53,8 @@ class DropboxIE(InfoExtractor):
else:
raise ExtractorError('Password protected video, use --video-password <password>', expected=True)
json_string = self._html_search_regex(r'InitReact\.mountComponent\(.*?,\s*(\{.+\})\s*?\)', webpage, 'Info JSON')
info_json = self._parse_json(json_string, video_id).get('props')
info_json = self._search_json(r'InitReact\.mountComponent\(.*?,', webpage, 'mountComponent', video_id,
contains_pattern=r'.+?"preview".+?', end_pattern=r'\)')['props']
transcode_url = traverse_obj(info_json, ((None, 'preview'), 'file', 'preview', 'content', 'transcode_url'), get_all=False)
formats, subtitles = self._extract_m3u8_formats_and_subtitles(transcode_url, video_id)

View File

@ -1,8 +1,8 @@
from .common import InfoExtractor
from .vimeo import VHXEmbedIE
from ..utils import (
clean_html,
ExtractorError,
clean_html,
get_element_by_class,
get_element_by_id,
get_elements_by_class,
@ -96,11 +96,12 @@ class DropoutIE(InfoExtractor):
def _login(self, display_id):
username, password = self._get_login_info()
if not (username and password):
self.raise_login_required(method='password')
if not username:
return True
response = self._download_webpage(
self._LOGIN_URL, display_id, note='Logging in', data=urlencode_postdata({
self._LOGIN_URL, display_id, note='Logging in', fatal=False,
data=urlencode_postdata({
'email': username,
'password': password,
'authenticity_token': self._get_authenticity_token(display_id),
@ -110,19 +111,25 @@ class DropoutIE(InfoExtractor):
user_has_subscription = self._search_regex(
r'user_has_subscription:\s*["\'](.+?)["\']', response, 'subscription status', default='none')
if user_has_subscription.lower() == 'true':
return response
return
elif user_has_subscription.lower() == 'false':
raise ExtractorError('Account is not subscribed')
return 'Account is not subscribed'
else:
raise ExtractorError('Incorrect username/password')
return 'Incorrect username/password'
def _real_extract(self, url):
display_id = self._match_id(url)
login_err, webpage = False, ''
try:
self._login(display_id)
webpage = self._download_webpage(url, display_id, note='Downloading video webpage')
login_err = self._login(display_id)
webpage = self._download_webpage(url, display_id)
finally:
if not login_err:
self._download_webpage('https://www.dropout.tv/logout', display_id, note='Logging out', fatal=False)
elif '<div id="watch-unauthorized"' in webpage:
if login_err is True:
self.raise_login_required(method='password')
raise ExtractorError(login_err, expected=True)
embed_url = self._search_regex(r'embed_url:\s*["\'](.+?)["\']', webpage, 'embed url')
thumbnail = self._og_search_thumbnail(webpage)

View File

@ -51,31 +51,39 @@ def _get_element_by_tag_and_attrib(html, tag=None, attribute=None, value=None, e
class DubokuIE(InfoExtractor):
IE_NAME = 'duboku'
IE_DESC = 'www.duboku.co'
IE_DESC = 'www.duboku.io'
_VALID_URL = r'(?:https?://[^/]+\.duboku\.co/vodplay/)(?P<id>[0-9]+-[0-9-]+)\.html.*'
_VALID_URL = r'(?:https?://[^/]+\.duboku\.io/vodplay/)(?P<id>[0-9]+-[0-9-]+)\.html.*'
_TESTS = [{
'url': 'https://www.duboku.co/vodplay/1575-1-1.html',
'url': 'https://w.duboku.io/vodplay/1575-1-1.html',
'info_dict': {
'id': '1575-1-1',
'ext': 'ts',
'ext': 'mp4',
'series': '白色月光',
'title': 'contains:白色月光',
'season_number': 1,
'episode_number': 1,
'season': 'Season 1',
'episode_id': '1',
'season_id': '1',
'episode': 'Episode 1',
},
'params': {
'skip_download': 'm3u8 download',
},
}, {
'url': 'https://www.duboku.co/vodplay/1588-1-1.html',
'url': 'https://w.duboku.io/vodplay/1588-1-1.html',
'info_dict': {
'id': '1588-1-1',
'ext': 'ts',
'ext': 'mp4',
'series': '亲爱的自己',
'title': 'contains:预告片',
'title': 'contains:第1集',
'season_number': 1,
'episode_number': 1,
'episode': 'Episode 1',
'season': 'Season 1',
'episode_id': '1',
'season_id': '1',
},
'params': {
'skip_download': 'm3u8 download',
@ -91,7 +99,7 @@ class DubokuIE(InfoExtractor):
season_id = temp[1]
episode_id = temp[2]
webpage_url = 'https://www.duboku.co/vodplay/%s.html' % video_id
webpage_url = 'https://w.duboku.io/vodplay/%s.html' % video_id
webpage_html = self._download_webpage(webpage_url, video_id)
# extract video url
@ -124,12 +132,13 @@ class DubokuIE(InfoExtractor):
data_from = player_data.get('from')
# if it is an embedded iframe, maybe it's an external source
headers = {'Referer': webpage_url}
if data_from == 'iframe':
# use _type url_transparent to retain the meaningful details
# of the video.
return {
'_type': 'url_transparent',
'url': smuggle_url(data_url, {'http_headers': {'Referer': webpage_url}}),
'url': smuggle_url(data_url, {'http_headers': headers}),
'id': video_id,
'title': title,
'series': series_title,
@ -139,7 +148,7 @@ class DubokuIE(InfoExtractor):
'episode_id': episode_id,
}
formats = self._extract_m3u8_formats(data_url, video_id, 'mp4')
formats = self._extract_m3u8_formats(data_url, video_id, 'mp4', headers=headers)
return {
'id': video_id,
@ -150,36 +159,29 @@ class DubokuIE(InfoExtractor):
'episode_number': int_or_none(episode_id),
'episode_id': episode_id,
'formats': formats,
'http_headers': {'Referer': 'https://www.duboku.co/static/player/videojs.html'}
'http_headers': headers
}
class DubokuPlaylistIE(InfoExtractor):
IE_NAME = 'duboku:list'
IE_DESC = 'www.duboku.co entire series'
IE_DESC = 'www.duboku.io entire series'
_VALID_URL = r'(?:https?://[^/]+\.duboku\.co/voddetail/)(?P<id>[0-9]+)\.html.*'
_VALID_URL = r'(?:https?://[^/]+\.duboku\.io/voddetail/)(?P<id>[0-9]+)\.html.*'
_TESTS = [{
'url': 'https://www.duboku.co/voddetail/1575.html',
'url': 'https://w.duboku.io/voddetail/1575.html',
'info_dict': {
'id': 'startswith:1575',
'title': '白色月光',
},
'playlist_count': 12,
}, {
'url': 'https://www.duboku.co/voddetail/1554.html',
'url': 'https://w.duboku.io/voddetail/1554.html',
'info_dict': {
'id': 'startswith:1554',
'title': '以家人之名',
},
'playlist_mincount': 30,
}, {
'url': 'https://www.duboku.co/voddetail/1554.html#playlist2',
'info_dict': {
'id': '1554#playlist2',
'title': '以家人之名',
},
'playlist_mincount': 27,
}]
def _real_extract(self, url):
@ -189,7 +191,7 @@ class DubokuPlaylistIE(InfoExtractor):
series_id = mobj.group('id')
fragment = compat_urlparse.urlparse(url).fragment
webpage_url = 'https://www.duboku.co/voddetail/%s.html' % series_id
webpage_url = 'https://w.duboku.io/voddetail/%s.html' % series_id
webpage_html = self._download_webpage(webpage_url, series_id)
# extract title
@ -234,6 +236,6 @@ class DubokuPlaylistIE(InfoExtractor):
# return url results
return self.playlist_result([
self.url_result(
compat_urlparse.urljoin('https://www.duboku.co', x['href']),
compat_urlparse.urljoin('https://w.duboku.io', x['href']),
ie=DubokuIE.ie_key(), video_title=x.get('title'))
for x in playlist], series_id + '#' + playlist_id, title)

View File

@ -1,8 +1,11 @@
import base64
import json
import re
import urllib
from .common import InfoExtractor
from .adobepass import AdobePassIE
from .once import OnceIE
from ..compat import compat_str
from ..utils import (
determine_ext,
dict_get,
@ -24,7 +27,6 @@ class ESPNIE(OnceIE):
(?:
(?:
video/(?:clip|iframe/twitter)|
watch/player
)
(?:
.*?\?.*?\bid=|
@ -47,6 +49,8 @@ class ESPNIE(OnceIE):
'description': 'md5:39370c2e016cb4ecf498ffe75bef7f0f',
'timestamp': 1390936111,
'upload_date': '20140128',
'duration': 1302,
'thumbnail': r're:https://.+\.jpg',
},
'params': {
'skip_download': True,
@ -71,15 +75,6 @@ class ESPNIE(OnceIE):
}, {
'url': 'https://cdn.espn.go.com/video/clip/_/id/19771774',
'only_matching': True,
}, {
'url': 'http://www.espn.com/watch/player?id=19141491',
'only_matching': True,
}, {
'url': 'http://www.espn.com/watch/player?bucketId=257&id=19505875',
'only_matching': True,
}, {
'url': 'http://www.espn.com/watch/player/_/id/19141491',
'only_matching': True,
}, {
'url': 'http://www.espn.com/video/clip?id=10365079',
'only_matching': True,
@ -98,7 +93,13 @@ class ESPNIE(OnceIE):
}, {
'url': 'http://www.espn.com/espnw/video/26066627/arkansas-gibson-completes-hr-cycle-four-innings',
'only_matching': True,
}]
}, {
'url': 'http://www.espn.com/watch/player?id=19141491',
'only_matching': True,
}, {
'url': 'http://www.espn.com/watch/player?bucketId=257&id=19505875',
'only_matching': True,
}, ]
def _real_extract(self, url):
video_id = self._match_id(url)
@ -116,7 +117,7 @@ class ESPNIE(OnceIE):
for source_id, source in source.items():
if source_id == 'alert':
continue
elif isinstance(source, compat_str):
elif isinstance(source, str):
extract_source(source, base_source_id)
elif isinstance(source, dict):
traverse_source(
@ -196,7 +197,7 @@ class ESPNArticleIE(InfoExtractor):
@classmethod
def suitable(cls, url):
return False if ESPNIE.suitable(url) else super(ESPNArticleIE, cls).suitable(url)
return False if (ESPNIE.suitable(url) or WatchESPNIE.suitable(url)) else super(ESPNArticleIE, cls).suitable(url)
def _real_extract(self, url):
video_id = self._match_id(url)
@ -277,3 +278,119 @@ class ESPNCricInfoIE(InfoExtractor):
'formats': formats,
'subtitles': subtitles,
}
class WatchESPNIE(AdobePassIE):
_VALID_URL = r'https://www.espn.com/watch/player/_/id/(?P<id>[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12})'
_TESTS = [{
'url': 'https://www.espn.com/watch/player/_/id/ba7d17da-453b-4697-bf92-76a99f61642b',
'info_dict': {
'id': 'ba7d17da-453b-4697-bf92-76a99f61642b',
'ext': 'mp4',
'title': 'Serbia vs. Turkey',
'thumbnail': 'https://artwork.api.espn.com/artwork/collections/media/ba7d17da-453b-4697-bf92-76a99f61642b/default?width=640&apikey=1ngjw23osgcis1i1vbj96lmfqs',
},
'params': {
'skip_download': True,
},
}, {
'url': 'https://www.espn.com/watch/player/_/id/4e9b5bd1-4ceb-4482-9d28-1dd5f30d2f34',
'info_dict': {
'id': '4e9b5bd1-4ceb-4482-9d28-1dd5f30d2f34',
'ext': 'mp4',
'title': 'Real Madrid vs. Real Betis (LaLiga)',
'thumbnail': 'https://s.secure.espncdn.com/stitcher/artwork/collections/media/bd1f3d12-0654-47d9-852e-71b85ea695c7/16x9.jpg?timestamp=202201112217&showBadge=true&cb=12&package=ESPN_PLUS',
},
'params': {
'skip_download': True,
},
}]
_API_KEY = 'ZXNwbiZicm93c2VyJjEuMC4w.ptUt7QxsteaRruuPmGZFaJByOoqKvDP2a5YkInHrc7c'
def _call_bamgrid_api(self, path, video_id, payload=None, headers={}):
if 'Authorization' not in headers:
headers['Authorization'] = f'Bearer {self._API_KEY}'
parse = urllib.parse.urlencode if path == 'token' else json.dumps
return self._download_json(
f'https://espn.api.edge.bamgrid.com/{path}', video_id, headers=headers, data=parse(payload).encode())
def _real_extract(self, url):
video_id = self._match_id(url)
video_data = self._download_json(
f'https://watch-cdn.product.api.espn.com/api/product/v3/watchespn/web/playback/event?id={video_id}',
video_id)['playbackState']
# ESPN+ subscription required, through cookies
if 'DTC' in video_data.get('sourceId'):
cookie = self._get_cookies(url).get('ESPN-ONESITE.WEB-PROD.token')
if not cookie:
self.raise_login_required(method='cookies')
assertion = self._call_bamgrid_api(
'devices', video_id,
headers={'Content-Type': 'application/json; charset=UTF-8'},
payload={
'deviceFamily': 'android',
'applicationRuntime': 'android',
'deviceProfile': 'tv',
'attributes': {},
})['assertion']
token = self._call_bamgrid_api(
'token', video_id, payload={
'subject_token': assertion,
'subject_token_type': 'urn:bamtech:params:oauth:token-type:device',
'platform': 'android',
'grant_type': 'urn:ietf:params:oauth:grant-type:token-exchange'
})['access_token']
assertion = self._call_bamgrid_api(
'accounts/grant', video_id, payload={'id_token': cookie.value.split('|')[1]},
headers={
'Authorization': token,
'Content-Type': 'application/json; charset=UTF-8'
})['assertion']
token = self._call_bamgrid_api(
'token', video_id, payload={
'subject_token': assertion,
'subject_token_type': 'urn:bamtech:params:oauth:token-type:account',
'platform': 'android',
'grant_type': 'urn:ietf:params:oauth:grant-type:token-exchange'
})['access_token']
playback = self._download_json(
video_data['videoHref'].format(scenario='browser~ssai'), video_id,
headers={
'Accept': 'application/vnd.media-service+json; version=5',
'Authorization': token
})
m3u8_url, headers = playback['stream']['complete'][0]['url'], {'authorization': token}
# No login required
elif video_data.get('sourceId') == 'ESPN_FREE':
asset = self._download_json(
f'https://watch.auth.api.espn.com/video/auth/media/{video_id}/asset?apikey=uiqlbgzdwuru14v627vdusswb',
video_id)
m3u8_url, headers = asset['stream'], {}
# TV Provider required
else:
resource = self._get_mvpd_resource('ESPN', video_data['name'], video_id, None)
auth = self._extract_mvpd_auth(url, video_id, 'ESPN', resource).encode()
asset = self._download_json(
f'https://watch.auth.api.espn.com/video/auth/media/{video_id}/asset?apikey=uiqlbgzdwuru14v627vdusswb',
video_id, data=f'adobeToken={urllib.parse.quote_plus(base64.b64encode(auth))}&drmSupport=HLS'.encode())
m3u8_url, headers = asset['stream'], {}
formats, subtitles = self._extract_m3u8_formats_and_subtitles(m3u8_url, video_id, 'mp4', m3u8_id='hls')
self._sort_formats(formats)
return {
'id': video_id,
'title': video_data.get('name'),
'formats': formats,
'subtitles': subtitles,
'thumbnail': video_data.get('posterHref'),
'http_headers': headers,
}

View File

@ -19,9 +19,10 @@ class ExpressenIE(InfoExtractor):
'''
_TESTS = [{
'url': 'https://www.expressen.se/tv/ledare/ledarsnack/ledarsnack-om-arbetslosheten-bland-kvinnor-i-speciellt-utsatta-omraden/',
'md5': '2fbbe3ca14392a6b1b36941858d33a45',
'md5': 'deb2ca62e7b1dcd19fa18ba37523f66e',
'info_dict': {
'id': '8690962',
'id': 'ba90f5a9-78d1-4511-aa02-c177b9c99136',
'display_id': 'ledarsnack-om-arbetslosheten-bland-kvinnor-i-speciellt-utsatta-omraden',
'ext': 'mp4',
'title': 'Ledarsnack: Om arbetslösheten bland kvinnor i speciellt utsatta områden',
'description': 'md5:f38c81ff69f3de4d269bbda012fcbbba',
@ -64,7 +65,7 @@ class ExpressenIE(InfoExtractor):
display_id, transform_source=unescapeHTML)
info = extract_data('video-tracking-info')
video_id = info['videoId']
video_id = info['contentId']
data = extract_data('article-data')
stream = data['stream']

File diff suppressed because it is too large Load Diff

View File

@ -1,9 +1,7 @@
import re
from .common import InfoExtractor
from ..compat import (
compat_parse_qs,
)
from ..compat import compat_parse_qs
from ..dependencies import websockets
from ..utils import (
ExtractorError,
@ -209,7 +207,7 @@ class FC2LiveIE(InfoExtractor):
'User-Agent': self.get_param('http_headers')['User-Agent'],
})
self.write_debug('[debug] Sending HLS server request')
self.write_debug('Sending HLS server request')
while True:
recv = ws.recv()
@ -231,13 +229,10 @@ class FC2LiveIE(InfoExtractor):
if not data or not isinstance(data, dict):
continue
if data.get('name') == '_response_' and data.get('id') == 1:
self.write_debug('[debug] Goodbye.')
self.write_debug('Goodbye')
playlist_data = data
break
elif self._downloader.params.get('verbose', False):
if len(recv) > 100:
recv = recv[:100] + '...'
self.to_screen('[debug] Server said: %s' % recv)
self.write_debug('Server said: %s%s' % (recv[:100], '...' if len(recv) > 100 else ''))
if not playlist_data:
raise ExtractorError('Unable to fetch HLS playlist info via WebSocket')

View File

@ -94,7 +94,7 @@ class FlickrIE(InfoExtractor):
owner = video_info.get('owner', {})
uploader_id = owner.get('nsid')
uploader_path = owner.get('path_alias') or uploader_id
uploader_url = format_field(uploader_path, template='https://www.flickr.com/photos/%s/')
uploader_url = format_field(uploader_path, None, 'https://www.flickr.com/photos/%s/')
return {
'id': video_id,

View File

@ -0,0 +1,107 @@
from .common import InfoExtractor
from ..utils import traverse_obj, unified_timestamp
class FourZeroStudioArchiveIE(InfoExtractor):
_VALID_URL = r'https?://0000\.studio/(?P<uploader_id>[^/]+)/broadcasts/(?P<id>[^/]+)/archive'
IE_NAME = '0000studio:archive'
_TESTS = [{
'url': 'https://0000.studio/mumeijiten/broadcasts/1290f433-fce0-4909-a24a-5f7df09665dc/archive',
'info_dict': {
'id': '1290f433-fce0-4909-a24a-5f7df09665dc',
'title': 'noteで『canape』様へのファンレターを執筆します。数秘術その2',
'timestamp': 1653802534,
'release_timestamp': 1653796604,
'thumbnails': 'count:1',
'comments': 'count:7',
'uploader': '『中崎雄心』の執務室。',
'uploader_id': 'mumeijiten',
}
}]
def _real_extract(self, url):
video_id, uploader_id = self._match_valid_url(url).group('id', 'uploader_id')
webpage = self._download_webpage(url, video_id)
nuxt_data = self._search_nuxt_data(webpage, video_id, traverse=None)
pcb = traverse_obj(nuxt_data, ('ssrRefs', lambda _, v: v['__typename'] == 'PublicCreatorBroadcast'), get_all=False)
uploader_internal_id = traverse_obj(nuxt_data, (
'ssrRefs', lambda _, v: v['__typename'] == 'PublicUser', 'id'), get_all=False)
formats, subs = self._extract_m3u8_formats_and_subtitles(pcb['archiveUrl'], video_id, ext='mp4')
self._sort_formats(formats)
return {
'id': video_id,
'title': pcb.get('title'),
'age_limit': 18 if pcb.get('isAdult') else None,
'timestamp': unified_timestamp(pcb.get('finishTime')),
'release_timestamp': unified_timestamp(pcb.get('createdAt')),
'thumbnails': [{
'url': pcb['thumbnailUrl'],
'ext': 'png',
}] if pcb.get('thumbnailUrl') else None,
'formats': formats,
'subtitles': subs,
'comments': [{
'author': c.get('username'),
'author_id': c.get('postedUserId'),
'author_thumbnail': c.get('userThumbnailUrl'),
'id': c.get('id'),
'text': c.get('body'),
'timestamp': unified_timestamp(c.get('createdAt')),
'like_count': c.get('likeCount'),
'is_favorited': c.get('isLikedByOwner'),
'author_is_uploader': c.get('postedUserId') == uploader_internal_id,
} for c in traverse_obj(nuxt_data, (
'ssrRefs', ..., lambda _, v: v['__typename'] == 'PublicCreatorBroadcastComment')) or []],
'uploader_id': uploader_id,
'uploader': traverse_obj(nuxt_data, (
'ssrRefs', lambda _, v: v['__typename'] == 'PublicUser', 'username'), get_all=False),
}
class FourZeroStudioClipIE(InfoExtractor):
_VALID_URL = r'https?://0000\.studio/(?P<uploader_id>[^/]+)/archive-clip/(?P<id>[^/]+)'
IE_NAME = '0000studio:clip'
_TESTS = [{
'url': 'https://0000.studio/soeji/archive-clip/e46b0278-24cd-40a8-92e1-b8fc2b21f34f',
'info_dict': {
'id': 'e46b0278-24cd-40a8-92e1-b8fc2b21f34f',
'title': 'わたベーさんからイラスト差し入れいただきました。ありがとうございました!',
'timestamp': 1652109105,
'like_count': 1,
'uploader': 'ソエジマケイタ',
'uploader_id': 'soeji',
}
}]
def _real_extract(self, url):
video_id, uploader_id = self._match_valid_url(url).group('id', 'uploader_id')
webpage = self._download_webpage(url, video_id)
nuxt_data = self._search_nuxt_data(webpage, video_id, traverse=None)
clip_info = traverse_obj(nuxt_data, ('ssrRefs', lambda _, v: v['__typename'] == 'PublicCreatorArchivedClip'), get_all=False)
info = next((
m for m in self._parse_html5_media_entries(url, webpage, video_id)
if 'mp4' in traverse_obj(m, ('formats', ..., 'ext'))
), None)
if not info:
self.report_warning('Failed to find a desired media element. Falling back to using NUXT data.')
info = {
'formats': [{
'ext': 'mp4',
'url': url,
} for url in clip_info.get('mediaFiles') or [] if url],
}
return {
**info,
'id': video_id,
'title': clip_info.get('clipComment'),
'timestamp': unified_timestamp(clip_info.get('createdAt')),
'like_count': clip_info.get('likeCount'),
'uploader_id': uploader_id,
'uploader': traverse_obj(nuxt_data, (
'ssrRefs', lambda _, v: v['__typename'] == 'PublicUser', 'username'), get_all=False),
}

View File

@ -59,10 +59,13 @@ class FoxNewsIE(AMPIE):
@staticmethod
def _extract_urls(webpage):
return [
mobj.group('url')
f'https://video.foxnews.com/v/video-embed.html?video_id={mobj.group("video_id")}'
for mobj in re.finditer(
r'<(?:amp-)?iframe[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//video\.foxnews\.com/v/video-embed\.html?.*?\bvideo_id=\d+.*?)\1',
webpage)]
r'''(?x)
<(?:script|(?:amp-)?iframe)[^>]+\bsrc=["\']
(?:https?:)?//video\.foxnews\.com/v/(?:video-embed\.html|embed\.js)\?
(?:[^>"\']+&)?(?:video_)?id=(?P<video_id>\d+)
''', webpage)]
def _real_extract(self, url):
host, video_id = self._match_valid_url(url).groups()

View File

@ -1,125 +0,0 @@
import re
from .common import InfoExtractor
from ..utils import (
determine_ext,
extract_attributes,
int_or_none,
traverse_obj,
unified_strdate,
)
class FranceCultureIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?franceculture\.fr/emissions/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TESTS = [{
# playlist
'url': 'https://www.franceculture.fr/emissions/serie/hasta-dente',
'playlist_count': 12,
'info_dict': {
'id': 'hasta-dente',
'title': 'Hasta Dente',
'description': 'md5:57479af50648d14e9bb649e6b1f8f911',
'thumbnail': r're:^https?://.*\.jpg$',
'upload_date': '20201024',
},
'playlist': [{
'info_dict': {
'id': '3c1c2e55-41a0-11e5-9fe0-005056a87c89',
'ext': 'mp3',
'title': 'Jeudi, vous avez dit bizarre ?',
'description': 'md5:47cf1e00cc21c86b0210279996a812c6',
'duration': 604,
'upload_date': '20201024',
'thumbnail': r're:^https?://.*\.jpg$',
'timestamp': 1603576680
},
},
],
}, {
'url': 'https://www.franceculture.fr/emissions/carnet-nomade/rendez-vous-au-pays-des-geeks',
'info_dict': {
'id': 'rendez-vous-au-pays-des-geeks',
'display_id': 'rendez-vous-au-pays-des-geeks',
'ext': 'mp3',
'title': 'Rendez-vous au pays des geeks',
'thumbnail': r're:^https?://.*\.jpg$',
'upload_date': '20140301',
'vcodec': 'none',
'duration': 3569,
},
}, {
# no thumbnail
'url': 'https://www.franceculture.fr/emissions/la-recherche-montre-en-main/la-recherche-montre-en-main-du-mercredi-10-octobre-2018',
'only_matching': True,
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
info = {
'id': display_id,
'title': self._html_search_regex(
r'(?s)<h1[^>]*itemprop="[^"]*name[^"]*"[^>]*>(.+?)</h1>',
webpage, 'title', default=self._og_search_title(webpage)),
'description': self._html_search_regex(
r'(?s)<div[^>]+class="excerpt"[^>]*>(.*?)</div>', webpage, 'description', default=None),
'thumbnail': self._og_search_thumbnail(webpage),
'uploader': self._html_search_regex(
r'(?s)<span class="author">(.*?)</span>', webpage, 'uploader', default=None),
'upload_date': unified_strdate(self._html_search_regex(
r'(?s)class="teaser-text-date".*?(\d{2}/\d{2}/\d{4})', webpage, 'date', default=None)),
}
playlist_data = self._search_regex(
r'''(?sx)
<section[^>]+data-xiti-place="[^"]*?liste_episodes[^"?]*?"[^>]*>
(.*?)
</section>
''',
webpage, 'playlist data', fatal=False, default=None)
if playlist_data:
entries = []
for item, item_description in re.findall(
r'(?s)(<button[^<]*class="[^"]*replay-button[^>]*>).*?<p[^>]*class="[^"]*teaser-text-chapo[^>]*>(.*?)</p>',
playlist_data):
item_attributes = extract_attributes(item)
entries.append({
'id': item_attributes.get('data-emission-uuid'),
'url': item_attributes.get('data-url'),
'title': item_attributes.get('data-diffusion-title'),
'duration': int_or_none(traverse_obj(item_attributes, 'data-duration-seconds', 'data-duration-seconds')),
'description': item_description,
'timestamp': int_or_none(item_attributes.get('data-start-time')),
'thumbnail': info['thumbnail'],
'uploader': info['uploader'],
})
return {
'_type': 'playlist',
'entries': entries,
**info
}
video_data = extract_attributes(self._search_regex(
r'''(?sx)
(?:
</h1>|
<div[^>]+class="[^"]*?(?:title-zone-diffusion|heading-zone-(?:wrapper|player-button))[^"]*?"[^>]*>
).*?
(<button[^>]+data-(?:url|asset-source)="[^"]+"[^>]+>)
''',
webpage, 'video data'))
video_url = traverse_obj(video_data, 'data-url', 'data-asset-source')
ext = determine_ext(video_url.lower())
return {
'display_id': display_id,
'url': video_url,
'ext': ext,
'vcodec': 'none' if ext == 'mp3' else None,
'duration': int_or_none(video_data.get('data-duration')),
**info
}

141
yt_dlp/extractor/freetv.py Normal file
View File

@ -0,0 +1,141 @@
import itertools
import re
from .common import InfoExtractor
from ..utils import int_or_none, traverse_obj, urlencode_postdata
class FreeTvBaseIE(InfoExtractor):
def _get_api_response(self, content_id, resource_type, postdata):
return self._download_json(
'https://www.freetv.com/wordpress/wp-admin/admin-ajax.php',
content_id, data=urlencode_postdata(postdata),
note=f'Downloading {content_id} {resource_type} JSON')['data']
class FreeTvMoviesIE(FreeTvBaseIE):
_VALID_URL = r'https?://(?:www\.)?freetv\.com/peliculas/(?P<id>[^/]+)'
_TESTS = [{
'url': 'https://www.freetv.com/peliculas/atrapame-si-puedes/',
'md5': 'dc62d5abf0514726640077cd1591aa92',
'info_dict': {
'id': '428021',
'title': 'Atrápame Si Puedes',
'description': 'md5:ca63bc00898aeb2f64ec87c6d3a5b982',
'ext': 'mp4',
}
}, {
'url': 'https://www.freetv.com/peliculas/monstruoso/',
'md5': '509c15c68de41cb708d1f92d071f20aa',
'info_dict': {
'id': '377652',
'title': 'Monstruoso',
'description': 'md5:333fc19ee327b457b980e54a911ea4a3',
'ext': 'mp4',
}
}]
def _extract_video(self, content_id, action='olyott_video_play'):
api_response = self._get_api_response(content_id, 'video', {
'action': action,
'contentID': content_id,
})
video_id, video_url = api_response['displayMeta']['contentID'], api_response['displayMeta']['streamURLVideo']
formats, subtitles = self._extract_m3u8_formats_and_subtitles(video_url, video_id, 'mp4')
self._sort_formats(formats)
return {
'id': video_id,
'title': traverse_obj(api_response, ('displayMeta', 'title')),
'description': traverse_obj(api_response, ('displayMeta', 'desc')),
'formats': formats,
'subtitles': subtitles,
}
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
return self._extract_video(
self._search_regex((
r'class=["\'][^>]+postid-(?P<video_id>\d+)',
r'<link[^>]+freetv.com/\?p=(?P<video_id>\d+)',
r'<div[^>]+data-params=["\'][^>]+post_id=(?P<video_id>\d+)',
), webpage, 'video id', group='video_id'))
class FreeTvIE(FreeTvBaseIE):
IE_NAME = 'freetv:series'
_VALID_URL = r'https?://(?:www\.)?freetv\.com/series/(?P<id>[^/]+)'
_TESTS = [{
'url': 'https://www.freetv.com/series/el-detective-l/',
'info_dict': {
'id': 'el-detective-l',
'title': 'El Detective L',
'description': 'md5:f9f1143bc33e9856ecbfcbfb97a759be'
},
'playlist_count': 24,
}, {
'url': 'https://www.freetv.com/series/esmeraldas/',
'info_dict': {
'id': 'esmeraldas',
'title': 'Esmeraldas',
'description': 'md5:43d7ec45bd931d8268a4f5afaf4c77bf'
},
'playlist_count': 62,
}, {
'url': 'https://www.freetv.com/series/las-aventuras-de-leonardo/',
'info_dict': {
'id': 'las-aventuras-de-leonardo',
'title': 'Las Aventuras de Leonardo',
'description': 'md5:0c47130846c141120a382aca059288f6'
},
'playlist_count': 13,
},
]
def _extract_series_season(self, season_id, series_title):
episodes = self._get_api_response(season_id, 'series', {
'contentID': season_id,
'action': 'olyott_get_dynamic_series_content',
'type': 'list',
'perPage': '1000',
})['1']
for episode in episodes:
video_id = str(episode['contentID'])
formats, subtitles = self._extract_m3u8_formats_and_subtitles(episode['streamURL'], video_id, 'mp4')
self._sort_formats(formats)
yield {
'id': video_id,
'title': episode.get('fullTitle'),
'description': episode.get('description'),
'formats': formats,
'subtitles': subtitles,
'thumbnail': episode.get('thumbnail'),
'series': series_title,
'series_id': traverse_obj(episode, ('contentMeta', 'displayMeta', 'seriesID')),
'season_id': traverse_obj(episode, ('contentMeta', 'displayMeta', 'seasonID')),
'season_number': traverse_obj(
episode, ('contentMeta', 'displayMeta', 'seasonNum'), expected_type=int_or_none),
'episode_number': traverse_obj(
episode, ('contentMeta', 'displayMeta', 'episodeNum'), expected_type=int_or_none),
}
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
title = self._html_search_regex(
r'<h1[^>]+class=["\']synopis[^>]>(?P<title>[^<]+)', webpage, 'title', group='title', fatal=False)
description = self._html_search_regex(
r'<div[^>]+class=["\']+synopis content[^>]><p>(?P<description>[^<]+)',
webpage, 'description', group='description', fatal=False)
return self.playlist_result(
itertools.chain.from_iterable(
self._extract_series_season(season_id, title)
for season_id in re.findall(r'<option[^>]+value=["\'](\d+)["\']', webpage)),
display_id, title, description)

View File

@ -69,11 +69,13 @@ from .spankwire import SpankwireIE
from .sportbox import SportBoxIE
from .spotify import SpotifyBaseIE
from .springboardplatform import SpringboardPlatformIE
from .substack import SubstackIE
from .svt import SVTIE
from .teachable import TeachableIE
from .ted import TedEmbedIE
from .theplatform import ThePlatformIE
from .threeqsdn import ThreeQSDNIE
from .tiktok import TikTokIE
from .tnaflix import TNAFlixNetworkEmbedIE
from .tube8 import Tube8IE
from .tunein import TuneInBaseIE
@ -2541,7 +2543,104 @@ class GenericIE(InfoExtractor):
'timestamp': 1652833414,
'age_limit': 0,
}
},
{
'url': 'https://www.mollymovieclub.com/p/interstellar?s=r#details',
'md5': '198bde8bed23d0b23c70725c83c9b6d9',
'info_dict': {
'id': '53602801',
'ext': 'mpga',
'title': 'Interstellar',
'description': 'Listen now | Episode One',
'thumbnail': 'md5:c30d9c83f738e16d8551d7219d321538',
'uploader': 'Molly Movie Club',
'uploader_id': '839621',
},
},
{
'url': 'https://www.blockedandreported.org/p/episode-117-lets-talk-about-depp?s=r',
'md5': 'c0cc44ee7415daeed13c26e5b56d6aa0',
'info_dict': {
'id': '57962052',
'ext': 'mpga',
'title': 'md5:855b2756f0ee10f6723fa00b16266f8d',
'description': 'md5:fe512a5e94136ad260c80bde00ea4eef',
'thumbnail': 'md5:2218f27dfe517bb5ac16c47d0aebac59',
'uploader': 'Blocked and Reported',
'uploader_id': '500230',
},
},
{
'url': 'https://www.skimag.com/video/ski-people-1980/',
'info_dict': {
'id': 'ski-people-1980',
'title': 'Ski People (1980)',
},
'playlist_count': 1,
'playlist': [{
'md5': '022a7e31c70620ebec18deeab376ee03',
'info_dict': {
'id': 'YTmgRiNU',
'ext': 'mp4',
'title': '1980 Ski People',
'timestamp': 1610407738,
'description': 'md5:cf9c3d101452c91e141f292b19fe4843',
'thumbnail': 'https://cdn.jwplayer.com/v2/media/YTmgRiNU/poster.jpg?width=720',
'duration': 5688.0,
'upload_date': '20210111',
}
}]
},
{
'note': 'Rumble embed',
'url': 'https://rumble.com/vdmum1-moose-the-dog-helps-girls-dig-a-snow-fort.html',
'md5': '53af34098a7f92c4e51cf0bd1c33f009',
'info_dict': {
'id': 'vb0ofn',
'ext': 'mp4',
'timestamp': 1612662578,
'uploader': 'LovingMontana',
'channel': 'LovingMontana',
'upload_date': '20210207',
'title': 'Winter-loving dog helps girls dig a snow fort ',
'channel_url': 'https://rumble.com/c/c-546523',
'thumbnail': 'https://sp.rmbl.ws/s8/1/5/f/x/x/5fxxb.OvCc.1-small-Moose-The-Dog-Helps-Girls-D.jpg',
'duration': 103,
}
},
{
'note': 'Rumble JS embed',
'url': 'https://therightscoop.com/what-does-9-plus-1-plus-1-equal-listen-to-this-audio-of-attempted-kavanaugh-assassins-call-and-youll-get-it',
'md5': '4701209ac99095592e73dbba21889690',
'info_dict': {
'id': 'v15eqxl',
'ext': 'mp4',
'channel': 'Mr Producer Media',
'duration': 92,
'title': '911 Audio From The Man Who Wanted To Kill Supreme Court Justice Kavanaugh',
'channel_url': 'https://rumble.com/c/RichSementa',
'thumbnail': 'https://sp.rmbl.ws/s8/1/P/j/f/A/PjfAe.OvCc-small-911-Audio-From-The-Man-Who-.jpg',
'timestamp': 1654892716,
'uploader': 'Mr Producer Media',
'upload_date': '20220610',
}
},
{
'note': 'JSON LD with multiple @type',
'url': 'https://www.nu.nl/280161/video/hoe-een-bladvlo-dit-verwoestende-japanse-onkruid-moet-vernietigen.html',
'md5': 'c7949f34f57273013fb7ccb1156393db',
'info_dict': {
'id': 'ipy2AcGL',
'ext': 'mp4',
'description': 'md5:6a9d644bab0dc2dc06849c2505d8383d',
'thumbnail': r're:https://media\.nu\.nl/m/.+\.jpg',
'title': 'Hoe een bladvlo dit verwoestende Japanse onkruid moet vernietigen',
'timestamp': 1586577474,
'upload_date': '20200411',
'age_limit': 0,
'duration': 111.0,
}
},
]
def report_following_redirect(self, new_url):
@ -3017,6 +3116,7 @@ class GenericIE(InfoExtractor):
wistia_urls = WistiaIE._extract_urls(webpage)
if wistia_urls:
playlist = self.playlist_from_matches(wistia_urls, video_id, video_title, ie=WistiaIE.ie_key())
playlist['entries'] = list(playlist['entries'])
for entry in playlist['entries']:
entry.update({
'_type': 'url_transparent',
@ -3036,6 +3136,11 @@ class GenericIE(InfoExtractor):
# Don't set the extractor because it can be a track url or an album
return self.url_result(burl)
# Check for Substack custom domains
substack_url = SubstackIE._extract_url(webpage, url)
if substack_url:
return self.url_result(substack_url, SubstackIE)
# Look for embedded Vevo player
mobj = re.search(
r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//(?:cache\.)?vevo\.com/.+?)\1', webpage)
@ -3756,6 +3861,11 @@ class GenericIE(InfoExtractor):
if ruutu_urls:
return self.playlist_from_matches(ruutu_urls, video_id, video_title)
# Look for Tiktok embeds
tiktok_urls = TikTokIE._extract_urls(webpage)
if tiktok_urls:
return self.playlist_from_matches(tiktok_urls, video_id, video_title)
# Look for HTML5 media
entries = self._parse_html5_media_entries(url, webpage, video_id, m3u8_id='hls')
if entries:
@ -3865,15 +3975,10 @@ class GenericIE(InfoExtractor):
json_ld = self._search_json_ld(webpage, video_id, default={})
if json_ld.get('url') not in (url, None):
self.report_detected('JSON LD')
if determine_ext(json_ld['url']) == 'm3u8':
json_ld['formats'], json_ld['subtitles'] = self._extract_m3u8_formats_and_subtitles(
json_ld['url'], video_id, 'mp4')
json_ld.pop('url')
self._sort_formats(json_ld['formats'])
else:
json_ld['_type'] = 'url_transparent'
json_ld['url'] = smuggle_url(json_ld['url'], {'force_videoid': video_id, 'to_generic': True})
return merge_dicts(json_ld, info_dict)
return merge_dicts({
'_type': 'url_transparent',
'url': smuggle_url(json_ld['url'], {'force_videoid': video_id, 'to_generic': True}),
}, json_ld, info_dict)
def check_video(vurl):
if YoutubeIE.suitable(vurl):

View File

@ -276,3 +276,59 @@ class GoogleDriveIE(InfoExtractor):
'automatic_captions': self.extract_automatic_captions(
video_id, subtitles_id, hl),
}
class GoogleDriveFolderIE(InfoExtractor):
IE_NAME = 'GoogleDrive:Folder'
_VALID_URL = r'https?://(?:docs|drive)\.google\.com/drive/folders/(?P<id>[\w-]{28,})'
_TESTS = [{
'url': 'https://drive.google.com/drive/folders/1dQ4sx0-__Nvg65rxTSgQrl7VyW_FZ9QI',
'info_dict': {
'id': '1dQ4sx0-__Nvg65rxTSgQrl7VyW_FZ9QI',
'title': 'Forrest'
},
'playlist_count': 3,
}]
_BOUNDARY = '=====vc17a3rwnndj====='
_REQUEST = "/drive/v2beta/files?openDrive=true&reason=102&syncType=0&errorRecovery=false&q=trashed%20%3D%20false%20and%20'{folder_id}'%20in%20parents&fields=kind%2CnextPageToken%2Citems(kind%2CmodifiedDate%2CmodifiedByMeDate%2ClastViewedByMeDate%2CfileSize%2Cowners(kind%2CpermissionId%2Cid)%2ClastModifyingUser(kind%2CpermissionId%2Cid)%2ChasThumbnail%2CthumbnailVersion%2Ctitle%2Cid%2CresourceKey%2Cshared%2CsharedWithMeDate%2CuserPermission(role)%2CexplicitlyTrashed%2CmimeType%2CquotaBytesUsed%2Ccopyable%2CfileExtension%2CsharingUser(kind%2CpermissionId%2Cid)%2Cspaces%2Cversion%2CteamDriveId%2ChasAugmentedPermissions%2CcreatedDate%2CtrashingUser(kind%2CpermissionId%2Cid)%2CtrashedDate%2Cparents(id)%2CshortcutDetails(targetId%2CtargetMimeType%2CtargetLookupStatus)%2Ccapabilities(canCopy%2CcanDownload%2CcanEdit%2CcanAddChildren%2CcanDelete%2CcanRemoveChildren%2CcanShare%2CcanTrash%2CcanRename%2CcanReadTeamDrive%2CcanMoveTeamDriveItem)%2Clabels(starred%2Ctrashed%2Crestricted%2Cviewed))%2CincompleteSearch&appDataFilter=NO_APP_DATA&spaces=drive&pageToken={page_token}&maxResults=50&supportsTeamDrives=true&includeItemsFromAllDrives=true&corpora=default&orderBy=folder%2Ctitle_natural%20asc&retryCount=0&key={key} HTTP/1.1"
_DATA = f'''--{_BOUNDARY}
content-type: application/http
content-transfer-encoding: binary
GET %s
--{_BOUNDARY}
'''
def _call_api(self, folder_id, key, data, **kwargs):
response = self._download_webpage(
'https://clients6.google.com/batch/drive/v2beta',
folder_id, data=data.encode('utf-8'),
headers={
'Content-Type': 'text/plain;charset=UTF-8;',
'Origin': 'https://drive.google.com',
}, query={
'$ct': f'multipart/mixed; boundary="{self._BOUNDARY}"',
'key': key
}, **kwargs)
return self._search_json('', response, 'api response', folder_id, **kwargs) or {}
def _get_folder_items(self, folder_id, key):
page_token = ''
while page_token is not None:
request = self._REQUEST.format(folder_id=folder_id, page_token=page_token, key=key)
page = self._call_api(folder_id, key, self._DATA % request)
yield from page['items']
page_token = page.get('nextPageToken')
def _real_extract(self, url):
folder_id = self._match_id(url)
webpage = self._download_webpage(url, folder_id)
key = self._search_regex(r'"(\w{39})"', webpage, 'key')
folder_info = self._call_api(folder_id, key, self._DATA % f'/drive/v2beta/files/{folder_id} HTTP/1.1', fatal=False)
return self.playlist_from_matches(
self._get_folder_items(folder_id, key), folder_id, folder_info.get('title'),
ie=GoogleDriveIE, getter=lambda item: f'https://drive.google.com/file/d/{item["id"]}')

View File

@ -1,23 +1,19 @@
from .common import InfoExtractor
from ..utils import (
determine_ext,
int_or_none,
strip_or_none,
xpath_attr,
xpath_text,
)
from ..utils import unified_strdate
class InaIE(InfoExtractor):
_VALID_URL = r'https?://(?:(?:www|m)\.)?ina\.fr/(?:video|audio)/(?P<id>[A-Z0-9_]+)'
_VALID_URL = r'https?://(?:(?:www|m)\.)?ina\.fr/(?:[^/]+/)?(?:video|audio)/(?P<id>\w+)'
_TESTS = [{
'url': 'http://www.ina.fr/video/I12055569/francois-hollande-je-crois-que-c-est-clair-video.html',
'md5': 'a667021bf2b41f8dc6049479d9bb38a3',
'url': 'https://www.ina.fr/video/I12055569/francois-hollande-je-crois-que-c-est-clair-video.html',
'md5': 'c5a09e5cb5604ed10709f06e7a377dda',
'info_dict': {
'id': 'I12055569',
'ext': 'mp4',
'title': 'François Hollande "Je crois que c\'est clair"',
'description': 'md5:3f09eb072a06cb286b8f7e4f77109663',
'description': 'md5:08201f1c86fb250611f0ba415d21255a',
'upload_date': '20070712',
'thumbnail': 'https://cdn-hub.ina.fr/notice/690x517/3c4/I12055569.jpeg',
}
}, {
'url': 'https://www.ina.fr/video/S806544_001/don-d-organes-des-avancees-mais-d-importants-besoins-video.html',
@ -31,53 +27,37 @@ class InaIE(InfoExtractor):
}, {
'url': 'http://m.ina.fr/video/I12055569',
'only_matching': True,
}, {
'url': 'https://www.ina.fr/ina-eclaire-actu/video/cpb8205116303/les-jeux-electroniques',
'md5': '4b8284a9a3a184fdc7e744225b8251e7',
'info_dict': {
'id': 'CPB8205116303',
'ext': 'mp4',
'title': 'Les jeux électroniques',
'description': 'md5:e09f7683dad1cc60b74950490127d233',
'upload_date': '19821204',
'duration': 657,
'thumbnail': 'https://cdn-hub.ina.fr/notice/690x517/203/CPB8205116303.jpeg',
}
}]
def _real_extract(self, url):
video_id = self._match_id(url)
info_doc = self._download_xml(
'http://player.ina.fr/notices/%s.mrss' % video_id, video_id)
item = info_doc.find('channel/item')
title = xpath_text(item, 'title', fatal=True)
media_ns_xpath = lambda x: self._xpath_ns(x, 'http://search.yahoo.com/mrss/')
content = item.find(media_ns_xpath('content'))
video_id = self._match_id(url).upper()
webpage = self._download_webpage(url, video_id)
get_furl = lambda x: xpath_attr(content, media_ns_xpath(x), 'url')
formats = []
for q, w, h in (('bq', 400, 300), ('mq', 512, 384), ('hq', 768, 576)):
q_url = get_furl(q)
if not q_url:
continue
formats.append({
'format_id': q,
'url': q_url,
'width': w,
'height': h,
})
if not formats:
furl = get_furl('player') or content.attrib['url']
ext = determine_ext(furl)
formats = [{
'url': furl,
'vcodec': 'none' if ext == 'mp3' else None,
'ext': ext,
}]
api_url = self._html_search_regex(
r'asset-details-url\s*=\s*["\'](?P<api_url>[^"\']+)',
webpage, 'api_url').replace(video_id, f'{video_id}.json')
thumbnails = []
for thumbnail in content.findall(media_ns_xpath('thumbnail')):
thumbnail_url = thumbnail.get('url')
if not thumbnail_url:
continue
thumbnails.append({
'url': thumbnail_url,
'height': int_or_none(thumbnail.get('height')),
'width': int_or_none(thumbnail.get('width')),
})
api_response = self._download_json(api_url, video_id)
return {
'id': video_id,
'formats': formats,
'title': title,
'description': strip_or_none(xpath_text(item, 'description')),
'thumbnails': thumbnails,
'url': api_response['resourceUrl'],
'ext': {'video': 'mp4', 'audio': 'mp3'}.get(api_response.get('type')),
'title': api_response.get('title'),
'description': api_response.get('description'),
'upload_date': unified_strdate(api_response.get('dateOfBroadcast')),
'duration': api_response.get('duration'),
'thumbnail': api_response.get('resourceThumbnail'),
}

View File

@ -410,7 +410,7 @@ class InstagramIE(InstagramBaseIE):
if nodes:
return self.playlist_result(
self._extract_nodes(nodes, True), video_id,
format_field(username, template='Post by %s'), description)
format_field(username, None, 'Post by %s'), description)
video_url = self._og_search_video_url(webpage, secure=False)

View File

@ -37,7 +37,7 @@ def md5_text(text):
return hashlib.md5(text.encode('utf-8')).hexdigest()
class IqiyiSDK(object):
class IqiyiSDK:
def __init__(self, target, ip, timestamp):
self.target = target
self.ip = ip
@ -131,7 +131,7 @@ class IqiyiSDK(object):
self.target = self.digit_sum(self.timestamp) + chunks[0] + compat_str(sum(ip))
class IqiyiSDKInterpreter(object):
class IqiyiSDKInterpreter:
def __init__(self, sdk_code):
self.sdk_code = sdk_code
@ -610,7 +610,7 @@ class IqIE(InfoExtractor):
preview_time = traverse_obj(
initial_format_data, ('boss_ts', (None, 'data'), ('previewTime', 'rtime')), expected_type=float_or_none, get_all=False)
if traverse_obj(initial_format_data, ('boss_ts', 'data', 'prv'), expected_type=int_or_none):
self.report_warning('This preview video is limited%s' % format_field(preview_time, template=' to %s seconds'))
self.report_warning('This preview video is limited%s' % format_field(preview_time, None, ' to %s seconds'))
# TODO: Extract audio-only formats
for bid in set(traverse_obj(initial_format_data, ('program', 'video', ..., 'bid'), expected_type=str_or_none, default=[])):

View File

@ -1,3 +1,4 @@
import itertools
import re
import urllib
@ -171,37 +172,70 @@ class IwaraUserIE(IwaraBaseIE):
IE_NAME = 'iwara:user'
_TESTS = [{
'url': 'https://ecchi.iwara.tv/users/CuteMMD',
'note': 'number of all videos page is just 1 page. less than 40 videos',
'url': 'https://ecchi.iwara.tv/users/infinityyukarip',
'info_dict': {
'id': 'CuteMMD',
'title': 'Uploaded videos from Infinity_YukariP',
'id': 'infinityyukarip',
'uploader': 'Infinity_YukariP',
'uploader_id': 'infinityyukarip',
},
'playlist_mincount': 198,
'playlist_mincount': 39,
}, {
# urlencoded
'url': 'https://ecchi.iwara.tv/users/%E5%92%95%E5%98%BF%E5%98%BF',
'note': 'no even all videos page. probably less than 10 videos',
'url': 'https://ecchi.iwara.tv/users/mmd-quintet',
'info_dict': {
'id': '咕嘿嘿',
'title': 'Uploaded videos from mmd quintet',
'id': 'mmd-quintet',
'uploader': 'mmd quintet',
'uploader_id': 'mmd-quintet',
},
'playlist_mincount': 141,
'playlist_mincount': 6,
}, {
'note': 'has paging. more than 40 videos',
'url': 'https://ecchi.iwara.tv/users/theblackbirdcalls',
'info_dict': {
'title': 'Uploaded videos from TheBlackbirdCalls',
'id': 'theblackbirdcalls',
'uploader': 'TheBlackbirdCalls',
'uploader_id': 'theblackbirdcalls',
},
'playlist_mincount': 420,
}, {
'note': 'foreign chars in URL. there must be foreign characters in URL',
'url': 'https://ecchi.iwara.tv/users/ぶた丼',
'info_dict': {
'title': 'Uploaded videos from ぶた丼',
'id': 'ぶた丼',
'uploader': 'ぶた丼',
'uploader_id': 'ぶた丼',
},
'playlist_mincount': 170,
}]
def _entries(self, playlist_id, base_url, webpage):
def _entries(self, playlist_id, base_url):
webpage = self._download_webpage(
f'{base_url}/users/{playlist_id}', playlist_id)
videos_url = self._search_regex(r'<a href="(/users/[^/]+/videos)(?:\?[^"]+)?">', webpage, 'all videos url', default=None)
if not videos_url:
yield from self._extract_playlist(base_url, webpage)
return
page_urls = re.findall(
r'class="pager-item"[^>]*>\s*<a[^<]+href="([^"]+)', webpage)
videos_url = urljoin(base_url, videos_url)
for n, path in enumerate(page_urls, 2):
for n in itertools.count(1):
page = self._download_webpage(
videos_url, playlist_id, note=f'Downloading playlist page {n}',
query={'page': str(n - 1)} if n > 1 else {})
yield from self._extract_playlist(
base_url, self._download_webpage(
urljoin(base_url, path), playlist_id, note=f'Downloading playlist page {n}'))
base_url, page)
if f'page={n}' not in page:
break
def _real_extract(self, url):
playlist_id, base_url = self._match_valid_url(url).group('id', 'base_url')
playlist_id = urllib.parse.unquote(playlist_id)
webpage = self._download_webpage(
f'{base_url}/users/{playlist_id}/videos', playlist_id)
return self.playlist_result(
self._entries(playlist_id, base_url, webpage), playlist_id)
self._entries(playlist_id, base_url), playlist_id)

View File

@ -0,0 +1,84 @@
import base64
from .common import InfoExtractor
from ..utils import (
ExtractorError,
get_element_by_id,
int_or_none,
js_to_json,
str_or_none,
traverse_obj,
)
class IxiguaIE(InfoExtractor):
_VALID_URL = r'https?://(?:\w+\.)?ixigua\.com/(?:video/)?(?P<id>\d+).+'
_TESTS = [{
'url': 'https://www.ixigua.com/6996881461559165471',
'info_dict': {
'id': '6996881461559165471',
'ext': 'mp4',
'title': '盲目涉水风险大,亲身示范高水位行车注意事项',
'description': 'md5:8c82f46186299add4a1c455430740229',
'tags': ['video_car'],
'like_count': int,
'dislike_count': int,
'view_count': int,
'uploader': '懂车帝原创',
'uploader_id': '6480145787',
'thumbnail': r're:^https?://.+\.(avif|webp)',
'timestamp': 1629088414,
'duration': 1030,
}
}]
def _get_json_data(self, webpage, video_id):
js_data = get_element_by_id('SSR_HYDRATED_DATA', webpage)
if not js_data:
if self._cookies_passed:
raise ExtractorError('Failed to get SSR_HYDRATED_DATA')
raise ExtractorError('Cookies (not necessarily logged in) are needed', expected=True)
return self._parse_json(
js_data.replace('window._SSR_HYDRATED_DATA=', ''), video_id, transform_source=js_to_json)
def _media_selector(self, json_data):
for path, override in (
(('video_list', ), {}),
(('dynamic_video', 'dynamic_video_list'), {'acodec': 'none'}),
(('dynamic_video', 'dynamic_audio_list'), {'vcodec': 'none', 'ext': 'm4a'}),
):
for media in traverse_obj(json_data, (..., *path, lambda _, v: v['main_url'])):
yield {
'url': base64.b64decode(media['main_url']).decode(),
'width': int_or_none(media.get('vwidth')),
'height': int_or_none(media.get('vheight')),
'fps': int_or_none(media.get('fps')),
'vcodec': media.get('codec_type'),
'format_id': str_or_none(media.get('quality_type')),
'filesize': int_or_none(media.get('size')),
'ext': 'mp4',
**override,
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
json_data = self._get_json_data(webpage, video_id)['anyVideo']['gidInformation']['packerData']['video']
formats = list(self._media_selector(json_data.get('videoResource')))
self._sort_formats(formats)
return {
'id': video_id,
'title': json_data.get('title'),
'description': json_data.get('video_abstract'),
'formats': formats,
'like_count': json_data.get('video_like_count'),
'duration': int_or_none(json_data.get('duration')),
'tags': [json_data.get('tag')],
'uploader_id': traverse_obj(json_data, ('user_info', 'user_id')),
'uploader': traverse_obj(json_data, ('user_info', 'name')),
'view_count': json_data.get('video_watch_count'),
'dislike_count': json_data.get('video_unlike_count'),
'timestamp': int_or_none(json_data.get('video_publish_time')),
}

View File

@ -70,7 +70,7 @@ class JojIE(InfoExtractor):
r'(\d+)[pP]\.', format_url, 'height', default=None)
formats.append({
'url': format_url,
'format_id': format_field(height, template='%sp'),
'format_id': format_field(height, None, '%sp'),
'height': int(height),
})
if not formats:

View File

@ -5,7 +5,7 @@ from ..utils import unsmuggle_url
class JWPlatformIE(InfoExtractor):
_VALID_URL = r'(?:https?://(?:content\.jwplatform|cdn\.jwplayer)\.com/(?:(?:feed|player|thumb|preview)s|jw6|v2/media)/|jwplatform:)(?P<id>[a-zA-Z0-9]{8})'
_VALID_URL = r'(?:https?://(?:content\.jwplatform|cdn\.jwplayer)\.com/(?:(?:feed|player|thumb|preview|manifest)s|jw6|v2/media)/|jwplatform:)(?P<id>[a-zA-Z0-9]{8})'
_TESTS = [{
'url': 'http://content.jwplatform.com/players/nPripu9l-ALJ3XQCI.js',
'md5': 'fa8899fa601eb7c83a64e9d568bdf325',
@ -37,6 +37,9 @@ class JWPlatformIE(InfoExtractor):
webpage)
if ret:
return ret
mobj = re.search(r'<div\b[^>]* data-video-jw-id="([a-zA-Z0-9]{8})"', webpage)
if mobj:
return [f'jwplatform:{mobj.group(1)}']
def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {})

View File

@ -382,5 +382,5 @@ class KalturaIE(InfoExtractor):
'duration': info.get('duration'),
'timestamp': info.get('createdAt'),
'uploader_id': format_field(info, 'userId', ignore=('None', None)),
'view_count': info.get('plays'),
'view_count': int_or_none(info.get('plays')),
}

View File

@ -68,7 +68,7 @@ class KeezMoviesIE(InfoExtractor):
video_url, title, 32).decode('utf-8')
formats.append({
'url': format_url,
'format_id': format_field(height, template='%dp'),
'format_id': format_field(height, None, '%dp'),
'height': height,
'tbr': tbr,
})

View File

@ -0,0 +1,55 @@
from .common import InfoExtractor
from .dailymotion import DailymotionIE
class KickerIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)kicker\.(?:de)/(?P<id>[\w-]+)/video'
_TESTS = [{
'url': 'https://www.kicker.de/pogba-dembel-co-die-top-11-der-abloesefreien-spieler-905049/video',
'info_dict': {
'id': 'km04mrK0DrRAVxy2GcA',
'title': 'md5:b91d145bac5745ac58d5479d8347a875',
'ext': 'mp4',
'duration': 350,
'description': 'md5:a5a3dd77dbb6550dbfb997be100b9998',
'uploader_id': 'x2dfupo',
'timestamp': 1654677626,
'like_count': int,
'uploader': 'kicker.de',
'view_count': int,
'age_limit': 0,
'thumbnail': r're:https://s\d+\.dmcdn\.net/v/T-x741YeYAx8aSZ0Z/x1080',
'tags': ['published', 'category.InternationalSoccer'],
'upload_date': '20220608'
}
}, {
'url': 'https://www.kicker.de/ex-unioner-in-der-bezirksliga-felix-kroos-vereinschallenge-in-pankow-902825/video',
'info_dict': {
'id': 'k2omNsJKdZ3TxwxYSFJ',
'title': 'md5:72ec24d7f84b8436fe1e89d198152adf',
'ext': 'mp4',
'uploader_id': 'x2dfupo',
'duration': 331,
'timestamp': 1652966015,
'thumbnail': r're:https?://s\d+\.dmcdn\.net/v/TxU4Z1YYCmtisTbMq/x1080',
'tags': ['FELIX KROOS', 'EINFACH MAL LUPPEN', 'KROOS', 'FSV FORTUNA PANKOW', 'published', 'category.Amateurs', 'marketingpreset.Spreekick'],
'age_limit': 0,
'view_count': int,
'upload_date': '20220519',
'uploader': 'kicker.de',
'description': 'md5:0c2060c899a91c8bf40f578f78c5846f',
'like_count': int,
}
}]
def _real_extract(self, url):
video_slug = self._match_id(url)
webpage = self._download_webpage(url, video_slug)
dailymotion_video_id = self._search_regex(
r'data-dmprivateid\s*=\s*[\'"](?P<video_id>\w+)', webpage,
'video id', group='video_id')
return self.url_result(
f'https://www.dailymotion.com/video/{dailymotion_video_id}',
ie=DailymotionIE, video_title=self._html_extract_title(webpage))

28
yt_dlp/extractor/kth.py Normal file
View File

@ -0,0 +1,28 @@
from .common import InfoExtractor
from ..utils import smuggle_url
class KTHIE(InfoExtractor):
_VALID_URL = r'https?://play\.kth\.se/(?:[^/]+/)+(?P<id>[a-z0-9_]+)'
_TEST = {
'url': 'https://play.kth.se/media/Lunch+breakA+De+nya+aff%C3%A4rerna+inom+Fordonsdalen/0_uoop6oz9',
'md5': 'd83ada6d00ca98b73243a88efe19e8a6',
'info_dict': {
'id': '0_uoop6oz9',
'ext': 'mp4',
'title': 'md5:bd1d6931facb6828762a33e6ce865f37',
'thumbnail': 're:https?://.+/thumbnail/.+',
'duration': 3516,
'timestamp': 1647345358,
'upload_date': '20220315',
'uploader_id': 'md5:0ec23e33a89e795a4512930c8102509f',
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
result = self.url_result(
smuggle_url('kaltura:308:%s' % video_id, {
'service_url': 'https://api.kaltura.nordu.net'}),
'Kaltura')
return result

View File

@ -15,7 +15,7 @@ class LastFMPlaylistBaseIE(InfoExtractor):
for page_number in range(start_page_number, (last_page_number or start_page_number) + 1):
webpage = self._download_webpage(
url, playlist_id,
note='Downloading page %d%s' % (page_number, format_field(last_page_number, template=' of %d')),
note='Downloading page %d%s' % (page_number, format_field(last_page_number, None, ' of %d')),
query={'page': page_number})
page_entries = [
self.url_result(player_url, 'Youtube')

View File

@ -192,10 +192,11 @@ class LBRYIE(LBRYBaseIE):
claim_id, is_live = result['signing_channel']['claim_id'], True
headers = {'referer': 'https://player.odysee.live/'}
live_data = self._download_json(
f'https://api.live.odysee.com/v1/odysee/live/{claim_id}', claim_id,
'https://api.odysee.live/livestream/is_live', claim_id,
query={'channel_claim_id': claim_id},
note='Downloading livestream JSON metadata')['data']
streaming_url = final_url = live_data.get('url')
if not final_url and not live_data.get('live'):
streaming_url = final_url = live_data.get('VideoURL')
if not final_url and not live_data.get('Live'):
self.raise_no_formats('This stream is not live', True, claim_id)
else:
raise UnsupportedError(url)

View File

@ -34,7 +34,7 @@ class LineLiveBaseIE(InfoExtractor):
'timestamp': int_or_none(item.get('createdAt')),
'channel': channel.get('name'),
'channel_id': channel_id,
'channel_url': format_field(channel_id, template='https://live.line.me/channels/%s'),
'channel_url': format_field(channel_id, None, 'https://live.line.me/channels/%s'),
'duration': int_or_none(item.get('archiveDuration')),
'view_count': int_or_none(item.get('viewerCount')),
'comment_count': int_or_none(item.get('chatCount')),

View File

@ -116,7 +116,7 @@ class MedalTVIE(InfoExtractor):
author = try_get(
hydration_data, lambda x: list(x['profiles'].values())[0], dict) or {}
author_id = str_or_none(author.get('id'))
author_url = format_field(author_id, template='https://medal.tv/users/%s')
author_url = format_field(author_id, None, 'https://medal.tv/users/%s')
return {
'id': video_id,

View File

@ -20,7 +20,7 @@ class MediasetIE(ThePlatformBaseIE):
(?:
mediaset:|
https?://
(?:(?:www|static3)\.)?mediasetplay\.mediaset\.it/
(?:\w+\.)+mediaset\.it/
(?:
(?:video|on-demand|movie)/(?:[^/]+/)+[^/]+_|
player/index\.html\?.*?\bprogramGuid=
@ -159,6 +159,9 @@ class MediasetIE(ThePlatformBaseIE):
}, {
'url': 'https://www.mediasetplay.mediaset.it/movie/herculeslaleggendahainizio/hercules-la-leggenda-ha-inizio_F305927501000102',
'only_matching': True,
}, {
'url': 'https://mediasetinfinity.mediaset.it/video/braveandbeautiful/episodio-113_F310948005000402',
'only_matching': True,
}]
@staticmethod
@ -286,7 +289,7 @@ class MediasetShowIE(MediasetIE):
_VALID_URL = r'''(?x)
(?:
https?://
(?:(?:www|static3)\.)?mediasetplay\.mediaset\.it/
(\w+\.)+mediaset\.it/
(?:
(?:fiction|programmi-tv|serie-tv|kids)/(?:.+?/)?
(?:[a-z-]+)_SE(?P<id>\d{12})

Some files were not shown because too many files have changed in this diff Show More