System 1 package¶

Submodules¶

yt_audio_collector.system_1.fetch_youtube_data module¶

class yt_audio_collector.system_1.fetch_youtube_data.FetchValidYouTubeData[source]¶

Bases: object

get_valid_video_ids(query: str) → List[str][source]¶

Finds video ids which have Hindi audio and Hindi transcript from YouTube.

Parameters:¶

query: str: The search query.
status_container: Container: A container to store the status of the program.

Return:¶

List[str]: A list of valid video ids.

get_video_ids(query: str) → List[str][source]¶

Fetches all video ids corresponding to the given query from YouTube.

Parameters:¶

query: str: The search query.

Return:¶

List[str]: A list of video ids fetched from YouTube based on the given query.

yt_audio_collector.system_1.valid_transcript module¶

yt_audio_collector.system_1.valid_transcript.is_valid_hindi_transcript(transcript: List[dict], video_id: str) → bool[source]¶

Checks if the given transcript is valid: 1. The transcript must be in Hindi. 2. Exists for the full video without empty text.

Parameters:¶

transcript: List[dict]: A list of transcriptions of a video.
video_id: str: The ID of the video.

Return:¶

bool: True if the transcript is valid, False otherwise.

yt_audio_collector.system_1.video_to_audio module¶

This module contains functions to convert YouTube videos to audio files, detect the language of the audio, store the audio files, and check if the audio is in Hindi language.

yt_audio_collector.system_1.video_to_audio.convert_video_to_audio(video_id: str) → Path[source]¶

Converts a video to an audio file.

Parameters:¶

video_id: str: The id of the video.

Return:¶

Path: The path of the converted audio file.

yt_audio_collector.system_1.video_to_audio.duration_of_video(video_id: str) → int[source]¶

Gets the duration of the video in seconds using the pytube library.

Parameters:¶

video_id: str: The id of the video.

Return:¶

int: The duration of the video in seconds.

yt_audio_collector.system_1.video_to_audio.get_audio_language(audio_path: str) → str[source]¶

Detects the language of an audio using the whisper model.

Parameters:¶

audio_path: str: The path of the audio file.

Return:¶

str: The language of the given audio.

yt_audio_collector.system_1.video_to_audio.has_hindi_audio(video_id: str, query: str) → bool[source]¶

Converts the video to an audio file, determines its audio language, stores the audio file if it’s in Hindi language, and returns True. Otherwise, removes the audio file and returns False.

Parameters:¶

video_id: str: The id of the video.
query: str: Represents the data you need.

Return:¶

bool: True if the audio language is Hindi, and False otherwise.

yt_audio_collector.system_1.video_to_audio.store_audio(query: str, audio_path: Path) → None[source]¶

Stores the audio file in a separate audio folder.

Parameters:¶

query: str: Represents the data you need.
audio_path: Path: The path of the audio file.

System 1 package¶

Submodules¶

yt_audio_collector.system_1.fetch_youtube_data module¶

Parameters:¶

Return:¶

Parameters:¶

Return:¶

yt_audio_collector.system_1.valid_transcript module¶

Parameters:¶

Return:¶

yt_audio_collector.system_1.video_to_audio module¶

Parameters:¶

Return:¶

Parameters:¶

Return:¶

Parameters:¶

Return:¶

Parameters:¶

Return:¶

Parameters:¶

Module contents¶