System 1 package

Submodules

yt_audio_collector.system_1.fetch_youtube_data module

class yt_audio_collector.system_1.fetch_youtube_data.FetchValidYouTubeData[source]

Bases: object

get_valid_video_ids(query: str) List[str][source]

Finds video ids which have Hindi audio and Hindi transcript from YouTube.

Parameters:

query: str

The search query.

status_container: Container

A container to store the status of the program.

Return:

List[str]

A list of valid video ids.

get_video_ids(query: str) List[str][source]

Fetches all video ids corresponding to the given query from YouTube.

Parameters:

query: str

The search query.

Return:

List[str]

A list of video ids fetched from YouTube based on the given query.

yt_audio_collector.system_1.valid_transcript module

yt_audio_collector.system_1.valid_transcript.is_valid_hindi_transcript(transcript: List[dict], video_id: str) bool[source]

Checks if the given transcript is valid: 1. The transcript must be in Hindi. 2. Exists for the full video without empty text.

Parameters:

transcript: List[dict]

A list of transcriptions of a video.

video_id: str

The ID of the video.

Return:

bool

True if the transcript is valid, False otherwise.

yt_audio_collector.system_1.video_to_audio module

This module contains functions to convert YouTube videos to audio files, detect the language of the audio, store the audio files, and check if the audio is in Hindi language.

yt_audio_collector.system_1.video_to_audio.convert_video_to_audio(video_id: str) Path[source]

Converts a video to an audio file.

Parameters:

video_id: str

The id of the video.

Return:

Path

The path of the converted audio file.

yt_audio_collector.system_1.video_to_audio.duration_of_video(video_id: str) int[source]

Gets the duration of the video in seconds using the pytube library.

Parameters:

video_id: str

The id of the video.

Return:

int

The duration of the video in seconds.

yt_audio_collector.system_1.video_to_audio.get_audio_language(audio_path: str) str[source]

Detects the language of an audio using the whisper model.

Parameters:

audio_path: str

The path of the audio file.

Return:

str

The language of the given audio.

yt_audio_collector.system_1.video_to_audio.has_hindi_audio(video_id: str, query: str) bool[source]

Converts the video to an audio file, determines its audio language, stores the audio file if it’s in Hindi language, and returns True. Otherwise, removes the audio file and returns False.

Parameters:

video_id: str

The id of the video.

query: str

Represents the data you need.

Return:

bool

True if the audio language is Hindi, and False otherwise.

yt_audio_collector.system_1.video_to_audio.store_audio(query: str, audio_path: Path) None[source]

Stores the audio file in a separate audio folder.

Parameters:

query: str

Represents the data you need.

audio_path: Path

The path of the audio file.

Module contents