System 1 package¶
Submodules¶
yt_audio_collector.system_1.fetch_youtube_data module¶
- class yt_audio_collector.system_1.fetch_youtube_data.FetchValidYouTubeData[source]¶
Bases:
object
yt_audio_collector.system_1.valid_transcript module¶
- yt_audio_collector.system_1.valid_transcript.is_valid_hindi_transcript(transcript: List[dict], video_id: str) bool[source]¶
Checks if the given transcript is valid: 1. The transcript must be in Hindi. 2. Exists for the full video without empty text.
Parameters:¶
- transcript: List[dict]
A list of transcriptions of a video.
- video_id: str
The ID of the video.
Return:¶
- bool
True if the transcript is valid, False otherwise.
yt_audio_collector.system_1.video_to_audio module¶
This module contains functions to convert YouTube videos to audio files, detect the language of the audio, store the audio files, and check if the audio is in Hindi language.
- yt_audio_collector.system_1.video_to_audio.convert_video_to_audio(video_id: str) Path[source]¶
Converts a video to an audio file.
Parameters:¶
- video_id: str
The id of the video.
Return:¶
- Path
The path of the converted audio file.
- yt_audio_collector.system_1.video_to_audio.duration_of_video(video_id: str) int[source]¶
Gets the duration of the video in seconds using the pytube library.
Parameters:¶
- video_id: str
The id of the video.
Return:¶
- int
The duration of the video in seconds.
- yt_audio_collector.system_1.video_to_audio.get_audio_language(audio_path: str) str[source]¶
Detects the language of an audio using the whisper model.
Parameters:¶
- audio_path: str
The path of the audio file.
Return:¶
- str
The language of the given audio.
- yt_audio_collector.system_1.video_to_audio.has_hindi_audio(video_id: str, query: str) bool[source]¶
Converts the video to an audio file, determines its audio language, stores the audio file if it’s in Hindi language, and returns True. Otherwise, removes the audio file and returns False.
Parameters:¶
- video_id: str
The id of the video.
- query: str
Represents the data you need.
Return:¶
- bool
True if the audio language is Hindi, and False otherwise.