博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
字幕文件处理(2) - 字幕文件格式转化
阅读量:5233 次
发布时间:2019-06-14

本文共 6561 字,大约阅读时间需要 21 分钟。

摘要

上一篇文章我们实现了整数与时间格式的互转,常见的字幕文件的格式有WebVTT, SRT, TTML, 有的系统要求我们提供VTT格式, 有的系统只支持TTML格式,我们字幕做完一个拿到的可能是SRT格式, 所以设计到将不同格式的字幕文件进行转换。

本文介绍的示例代码实现了VTT 与SRT互转, 也可以将VTT或SRT转化到TTML。

同样, 匹配时间格式的正则表达式是: 

"([0-9]+:)?([0-9]+):([0-9]+)([\.|,][0-9]+)? --> ([0-9]+:)?([0-9]+):([0-9]+)([\.|,][0-9]+)?"

 

字幕格式对象是: 

class ClosedCaption    {        public string StartPoint { get; set; }        public string EndPoint { get; set; }        public string Transcript { get; set; }        public override string ToString()        {            StringBuilder sb = new StringBuilder();            sb.AppendLine(string.Format("{0} --> {1}", StartPoint, EndPoint));            sb.AppendLine(Transcript);            return sb.ToString();        }    }

从文件中读取字幕格式对象: 

public static void ReadTranscript(string filePath)        {            //0:0:4.480 --> 0:0:7.430            string timePattern = @"([0-9]+:)?([0-9]+):([0-9]+)([\.|,][0-9]+)? --> ([0-9]+:)?([0-9]+):([0-9]+)([\.|,][0-9]+)?";            using (var stream = new FileStream(filePath, FileMode.Open))            {                StreamReader reader = new StreamReader(stream);                string fileContent = reader.ReadToEnd();                // handle CC time                var cues = Regex.Matches(fileContent, timePattern, RegexOptions.IgnoreCase);                Captions = new List
(); foreach (Match cue in cues) { string timeLine = cue.Value.ToString(); string[] timeInfo = timeLine.Split(new string[] { "-->" }, StringSplitOptions.RemoveEmptyEntries); if (timeInfo.Length == 2) { string startInfo = timeInfo[0].Trim(); string endInfo = timeInfo[1].Trim(); startInfo = TimeFormat.ToHHMMSS(TimeFormat.ToDouble(startInfo),"t1"); endInfo = TimeFormat.ToHHMMSS(TimeFormat.ToDouble(endInfo), "t1"); Captions.Add(new ClosedCaption { StartPoint = startInfo, EndPoint = endInfo }); } } string newContent = Regex.Replace(fileContent, timePattern, "-->"); string[] splitParts = newContent.Split(new string[] { "-->"},StringSplitOptions.RemoveEmptyEntries); if (splitParts.Length -1 == Captions.Count) { for (int i = 1; i < splitParts.Length; i++) { //Captions[i-1].Transcript = splitParts[i]; string rawTranscript = splitParts[i]; string firstTrim = rawTranscript.Trim(new char[] { '\r', '\n' }); //trim last digital character int digitalCount = 0; if (firstTrim.Length > 1) { for (int x = firstTrim.Length - 1; x > firstTrim.Length - 5; x--) { int d = 0; if (Int32.TryParse(firstTrim[x].ToString(), out d) == true) digitalCount++; else break; } } string secondTrim = firstTrim; if (digitalCount != 0) { secondTrim = firstTrim.Remove(firstTrim.Length - digitalCount); } Captions[i - 1].Transcript = secondTrim.Trim(new char[] { '\r', '\n'}).Trim(); } } } }

由字幕对象生成VTT, SRT, 和TTML:

public static void Write2VTT(string vtt)        {            if (Captions.Count > 0)            {                StringBuilder sb = new StringBuilder();                sb.AppendLine("WEBVTT");                sb.AppendLine();                foreach (var item in Captions)                {                    sb.AppendLine(item.ToString()); //here will input a blank line because of two AppendLine();                }                using (StreamWriter writer = new StreamWriter(vtt, false))                {                    writer.Write(sb.ToString());                    writer.Flush();                    writer.Close();                }            }        }        public static void Write2SRT(string srt)        {            if (Captions.Count > 0)            {                StringBuilder sb = new StringBuilder();                for (int i = 0; i < Captions.Count; i++)                {                    sb.AppendLine((i + 1).ToString());                    sb.AppendLine(Captions[i].ToString()); // note here will input a blank line because of two AppendLine();                }                using (StreamWriter writer = new StreamWriter(srt))                {                    writer.Write(sb.ToString());                    writer.Flush();                    writer.Close();                }            }        }        public static void Write2TTML(string ttml)        {            StringBuilder sbContent = new StringBuilder();            string Content = string.Empty;            using (StreamReader sr = new StreamReader("ttSample1.txt"))            {                Content = sr.ReadToEnd();            }            if (Captions.Count > 0)            {                sbContent.AppendLine("
"); for (int i = 0; i < Captions.Count; i++) { double beginTime = TimeFormat.ToDouble(Captions[i].StartPoint); double endTime = TimeFormat.ToDouble(Captions[i].EndPoint); string begin = TimeFormat.ToHHMMSS(beginTime, "t1"); string end = TimeFormat.ToHHMMSS(endTime,"t1"); string content = HttpUtility.HtmlEncode(Captions[i].Transcript); sbContent.AppendLine(string.Format("

{3}

", "p" + i, begin, end, content)); } sbContent.AppendLine(@"
"); Content = string.Format(Content, sbContent.ToString()); using (StreamWriter writer = new StreamWriter(ttml)) { writer.Write(Content); writer.Flush(); writer.Close(); } } }

 

转化实例:in gitHub

 

转载请注明出处

转载于:https://www.cnblogs.com/qixue/p/5498396.html

你可能感兴趣的文章
Swift学习Day005
查看>>
第九章笔记
查看>>
12558 - Egyptian Fractions (HARD version) (IDA* + 剪枝)
查看>>
Wireshark 文件分割和合并
查看>>
结对编程项目-四则运算 挑战出题
查看>>
数组是什么
查看>>
【MM自动记账】自动记账事务说明(转)
查看>>
MyBatis_SelectKey使用oracle 序列插入主键
查看>>
System.getproperty()能取到的值
查看>>
插值法(内插法)
查看>>
php 关于文件夹的一些封装好的函数
查看>>
Java并发--Java中的CAS操作和实现原理
查看>>
java1.8新特性整理(全)
查看>>
elementUI之通过指定 Table 组件的 row-class-name 属性来为 Table 中的某一行添加 class改变该行的颜色等样式。...
查看>>
小技巧
查看>>
深度学习图像配准 Image Registration: From SIFT to Deep Learning
查看>>
可分离卷积详解及计算量 Basic Introduction to Separable Convolutions
查看>>
CNN中各类卷积总结:残差、shuffle、空洞卷积、变形卷积核、可分离卷积等
查看>>
Mean Average Precision(mAP),Precision,Recall,Accuracy,F1_score,PR曲线、ROC曲线,AUC值,决定系数R^2 的含义与计算...
查看>>
win7 能ping通dns, 但无法解析域名
查看>>