MANGA Ripper con problemas de rendimiento y bloqueo -- # camp codereview Relacionados El problema

Manga ripper with performance and blocking issues


3
vote

problema

Español

Acabo de pensar en crear un programa C # para descargar todos los capítulos de un manga, dada una URL. HTML Parsing se realiza con HtmlAgilityPack .

Los problemas que aún no he trabajado son el bloqueo de todo el programa, el rendimiento bastante lento de GetPagesLink() como se llama 9988777665544333 que usa WebClient Mucho (un objeto webclient para cada página dentro de un capítulo multiplicado por el número de capítulos) y el aumento continuo de la memoria utilizada. Al principio comienza con ~ 14 MB, pero aumenta infinitamente. Además de eso, todo funciona.

  using HtmlAgilityPack; using System; using System.Collections.Generic; using System.ComponentModel; using System.Data; using System.Diagnostics; using System.Drawing; using System.IO; using System.Linq; using System.Net; using System.Text; using System.Threading; using System.Threading.Tasks; using System.Web; using System.Windows.Forms; using Treasure;  namespace MangaRipper {     public partial class Form1 : Form     {         #region Properties          private Uri _Uri;          public Uri Uri         {             get { return _Uri; }             set { _Uri = value; }         }          private List<Tuple<string, string, string>> _Chapters = new List<Tuple<string, string, string>>();          public List<Tuple<string, string, string>> Chapters         {             get { return _Chapters; }             set { _Chapters = value; }         }          private string _MangaName;          public string MangaName         {             get { return _MangaName; }             set { _MangaName = value; }         }          #endregion Properties          public Form1()         {             InitializeComponent();         }          private void exitToolStripMenuItem_Click(object sender, EventArgs e)         {             Application.Exit();         }          private string LoadHtmlCode(string url)         {             using (WebClient client = new WebClient())             {                 try                 {                     // Avoid too many connection requests at once to prevent website from blocking us                     System.Threading.Thread.Sleep(200);                     client.Encoding = Encoding.UTF8;                     client.Proxy = null;                     return client.DownloadString(url);                 }                 catch (Exception ex)                 {                     Logger.Log(ex.Message);                     throw;                 }             }         }          private void btnLoad_Click(object sender, EventArgs e)         {             // Multiple mangas are delimited by a semicolon..             string t = txtURL.Text;             string[] split = t.Split(';');              foreach (var item in split)             {                 CreateDirectory(Path.GetFileNameWithoutExtension(item));                 MangaName = Path.GetFileNameWithoutExtension(item);                 Uri tempUri = new Uri(item);                 Uri = tempUri;                 try                 {                     using (WebClient client = new WebClient())                     {                         string htmlCode = LoadHtmlCode(Uri.AbsoluteUri);                         LoadAllChapters(htmlCode);                         Download();                     }                 }                 catch (Exception ex)                 {                     Logger.Log(ex.Message);                     Logger.Log(ex.StackTrace);                     Logger.Log(ex.InnerException.ToString());                 }             }         }          private void CreateDirectory(string dirName)         {             if (!Directory.Exists(dirName))             {                 Directory.CreateDirectory(dirName);             }         }          private void Download()         {             foreach (var chapter in Chapters)             {                 string bla = chapter.Item2;                 string chapterName = bla.Replace("?", "%3F").Replace(":", "%3A");                 // Skip this chapter if it already exists based on chapter name                 // TODO: Find better way to determine this. Incomplete downloads to a folder would be marked as completed...                 if (Directory.Exists(string.Format("{0}/{1} - {2}", MangaName, chapter.Item3, chapterName)))                 {                     continue;                 }                 else                 {                     Directory.CreateDirectory(string.Format("{0}/{1} - {2}", MangaName, chapter.Item3, chapterName));                 }                 List<Tuple<string, int>> temp = new List<Tuple<string, int>>();                 foreach (var item in GetPagesLink(chapter.Item1))                 {                     temp.Add(new Tuple<string, int>(GetImageLink(item.Item1), item.Item2));                 }                  foreach (var img in temp)                 {                     WebClient webClient = new WebClient();                     webClient.DownloadProgressChanged += new DownloadProgressChangedEventHandler(ProgressChanged);                     webClient.DownloadFileAsync(new Uri(img.Item1), string.Format("{0}/{1} - {2}/{3}.jpg", MangaName, chapter.Item3, chapterName, img.Item2)); // TODO: Find image type and replace hardcoded jpg                     System.Threading.Thread.Sleep(150);                 }                 temp.Clear();                 txtDebug.AppendText("Finished chapter " + chapter.Item3 + "  ");             }             Chapters.Clear();         }          private void ProgressChanged(object sender, DownloadProgressChangedEventArgs e)         {             progressBar1.Value = e.ProgressPercentage;         }          /// <summary>         /// Return a list containing tuples with the direct url to all pages of a chapter         /// </summary>         /// <param name="url"></param>         /// <returns></returns>         private List<Tuple<string, int>> GetPagesLink(string url)         {             List<Tuple<string, int>> pages = new List<Tuple<string, int>>();             HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();             htmlDoc.LoadHtml(LoadHtmlCode(url));             int counterPage = 1;              foreach (HtmlNode node in htmlDoc.DocumentNode.SelectNodes("//select[@id='pageMenu']//option"))             {                 //Console.WriteLine("Value=" + node.Attributes["value"].Value);                 //Console.WriteLine("InnerText=" + node.InnerText);                  pages.Add(new Tuple<string, int>("http://" + Uri.Host + node.Attributes["value"].Value, counterPage));                 counterPage++;             }             return pages;         }          /// <summary>         /// Extract direct download link of an image by given url         /// </summary>         /// <param name="url"></param>         /// <returns></returns>         private string GetImageLink(string url)         {             HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();             htmlDoc.LoadHtml(LoadHtmlCode(url));              return htmlDoc.GetElementbyId("img").GetAttributeValue("src", "not found");         }          /// <summary>         /// Load all chapter urls into Chapters property         /// </summary>         /// <param name="htmlCode"></param>         private void LoadAllChapters(string htmlCode)         {             HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();             htmlDoc.LoadHtml(htmlCode);              var chapterLink = htmlDoc.DocumentNode.SelectNodes(@"//div[@id='chapterlist']//a/@href");             var chapterName = htmlDoc.DocumentNode.SelectNodes(@"//div[@id='chapterlist']//a/@href/following-sibling::text()[1]").Reverse().ToList();             for (int i = 0; i < chapterLink.Count; i++)             {                 var link = "http://" + Uri.Host + chapterLink[i].GetAttributeValue("href", "not found");                 var name = chapterName[i].OuterHtml.Replace(" : ", "");                 var number = chapterLink[i].InnerText;                 Chapters.Add(new Tuple<string, string, string>(link, name, number));                  checkedListBox1.Items.Add(link);             }         }     } }   

Para registrar, escribí un registrador en una clase y espacio de nombres separados:

  using System; using System.Collections.Generic; using System.IO; using System.Linq; using System.Text; using System.Threading.Tasks; using System.Windows.Forms;  namespace Treasure {     /// <summary>     /// Logger class. Creates per-day logs, saved into "Log" directory     /// </summary>     public class Logger     {         /// <summary>         /// Logs the specified log message         /// </summary>         /// <param name="LogMsg">The message to be logged.</param>         /// <param name="args">Arguments like {1}</param>         public static void Log(string LogMsg, params object[] args)         {             if (!Directory.Exists(Environment.CurrentDirectory + @"Log"))             {                 try                 {                     Directory.CreateDirectory(Environment.CurrentDirectory + @"Log");                     using (StreamWriter sw = new StreamWriter(Environment.CurrentDirectory + @"Log" + DateTime.Now.ToString("dd-MM-yy") + ".txt", true))                     {                         sw.WriteLine(DateTime.Now.ToLongTimeString() + " " + string.Format(LogMsg, args));                     }                 }                 catch (Exception e)                 { #if DEBUG                     Console.WriteLine(e.Message + e.StackTrace); #endif  #if RELEASE                     MessageBox.Show("Could not create Log directory.", "Error");                     Log(e.Message); #endif                 }             }             else             {                 try                 {                     // Creates per-day log file with current date as file name                     using (StreamWriter sw = new StreamWriter(Environment.CurrentDirectory + @"Log" + DateTime.Now.ToString("dd-MM-yy") + ".txt", true))                     {                         sw.WriteLine(DateTime.Now.ToLongTimeString() + " " + string.Format(LogMsg, args));                     }                 }                 catch (Exception e)                 {                     Console.WriteLine(e.Message + e.StackTrace);                     Log(e.Message + " " + e.StackTrace);                 }             }         }     } }   
Original en ingles

I just thought about creating a C# program to download all chapters of a manga given an URL. HTML parsing is done with HtmlAgilityPack.

Issues I have yet to work out are the blocking of the whole program, the rather slow performance of GetPagesLink() as it calls LoadHtmlCode() which uses WebClient a lot (one Webclient object for every page inside a chapter multiplied by number of chapters) and the continuous increase of used memory. At first it starts with ~14 mb but increases infinitely. Besides that, everything works.

using HtmlAgilityPack; using System; using System.Collections.Generic; using System.ComponentModel; using System.Data; using System.Diagnostics; using System.Drawing; using System.IO; using System.Linq; using System.Net; using System.Text; using System.Threading; using System.Threading.Tasks; using System.Web; using System.Windows.Forms; using Treasure;  namespace MangaRipper {     public partial class Form1 : Form     {         #region Properties          private Uri _Uri;          public Uri Uri         {             get { return _Uri; }             set { _Uri = value; }         }          private List<Tuple<string, string, string>> _Chapters = new List<Tuple<string, string, string>>();          public List<Tuple<string, string, string>> Chapters         {             get { return _Chapters; }             set { _Chapters = value; }         }          private string _MangaName;          public string MangaName         {             get { return _MangaName; }             set { _MangaName = value; }         }          #endregion Properties          public Form1()         {             InitializeComponent();         }          private void exitToolStripMenuItem_Click(object sender, EventArgs e)         {             Application.Exit();         }          private string LoadHtmlCode(string url)         {             using (WebClient client = new WebClient())             {                 try                 {                     // Avoid too many connection requests at once to prevent website from blocking us                     System.Threading.Thread.Sleep(200);                     client.Encoding = Encoding.UTF8;                     client.Proxy = null;                     return client.DownloadString(url);                 }                 catch (Exception ex)                 {                     Logger.Log(ex.Message);                     throw;                 }             }         }          private void btnLoad_Click(object sender, EventArgs e)         {             // Multiple mangas are delimited by a semicolon..             string t = txtURL.Text;             string[] split = t.Split(';');              foreach (var item in split)             {                 CreateDirectory(Path.GetFileNameWithoutExtension(item));                 MangaName = Path.GetFileNameWithoutExtension(item);                 Uri tempUri = new Uri(item);                 Uri = tempUri;                 try                 {                     using (WebClient client = new WebClient())                     {                         string htmlCode = LoadHtmlCode(Uri.AbsoluteUri);                         LoadAllChapters(htmlCode);                         Download();                     }                 }                 catch (Exception ex)                 {                     Logger.Log(ex.Message);                     Logger.Log(ex.StackTrace);                     Logger.Log(ex.InnerException.ToString());                 }             }         }          private void CreateDirectory(string dirName)         {             if (!Directory.Exists(dirName))             {                 Directory.CreateDirectory(dirName);             }         }          private void Download()         {             foreach (var chapter in Chapters)             {                 string bla = chapter.Item2;                 string chapterName = bla.Replace("?", "%3F").Replace(":", "%3A");                 // Skip this chapter if it already exists based on chapter name                 // TODO: Find better way to determine this. Incomplete downloads to a folder would be marked as completed...                 if (Directory.Exists(string.Format("{0}/{1} - {2}", MangaName, chapter.Item3, chapterName)))                 {                     continue;                 }                 else                 {                     Directory.CreateDirectory(string.Format("{0}/{1} - {2}", MangaName, chapter.Item3, chapterName));                 }                 List<Tuple<string, int>> temp = new List<Tuple<string, int>>();                 foreach (var item in GetPagesLink(chapter.Item1))                 {                     temp.Add(new Tuple<string, int>(GetImageLink(item.Item1), item.Item2));                 }                  foreach (var img in temp)                 {                     WebClient webClient = new WebClient();                     webClient.DownloadProgressChanged += new DownloadProgressChangedEventHandler(ProgressChanged);                     webClient.DownloadFileAsync(new Uri(img.Item1), string.Format("{0}/{1} - {2}/{3}.jpg", MangaName, chapter.Item3, chapterName, img.Item2)); // TODO: Find image type and replace hardcoded jpg                     System.Threading.Thread.Sleep(150);                 }                 temp.Clear();                 txtDebug.AppendText("Finished chapter " + chapter.Item3 + "\r\n");             }             Chapters.Clear();         }          private void ProgressChanged(object sender, DownloadProgressChangedEventArgs e)         {             progressBar1.Value = e.ProgressPercentage;         }          /// <summary>         /// Return a list containing tuples with the direct url to all pages of a chapter         /// </summary>         /// <param name="url"></param>         /// <returns></returns>         private List<Tuple<string, int>> GetPagesLink(string url)         {             List<Tuple<string, int>> pages = new List<Tuple<string, int>>();             HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();             htmlDoc.LoadHtml(LoadHtmlCode(url));             int counterPage = 1;              foreach (HtmlNode node in htmlDoc.DocumentNode.SelectNodes("//select[@id='pageMenu']//option"))             {                 //Console.WriteLine("Value=" + node.Attributes["value"].Value);                 //Console.WriteLine("InnerText=" + node.InnerText);                  pages.Add(new Tuple<string, int>("http://" + Uri.Host + node.Attributes["value"].Value, counterPage));                 counterPage++;             }             return pages;         }          /// <summary>         /// Extract direct download link of an image by given url         /// </summary>         /// <param name="url"></param>         /// <returns></returns>         private string GetImageLink(string url)         {             HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();             htmlDoc.LoadHtml(LoadHtmlCode(url));              return htmlDoc.GetElementbyId("img").GetAttributeValue("src", "not found");         }          /// <summary>         /// Load all chapter urls into Chapters property         /// </summary>         /// <param name="htmlCode"></param>         private void LoadAllChapters(string htmlCode)         {             HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();             htmlDoc.LoadHtml(htmlCode);              var chapterLink = htmlDoc.DocumentNode.SelectNodes(@"//div[@id='chapterlist']//a/@href");             var chapterName = htmlDoc.DocumentNode.SelectNodes(@"//div[@id='chapterlist']//a/@href/following-sibling::text()[1]").Reverse().ToList();             for (int i = 0; i < chapterLink.Count; i++)             {                 var link = "http://" + Uri.Host + chapterLink[i].GetAttributeValue("href", "not found");                 var name = chapterName[i].OuterHtml.Replace(" : ", "");                 var number = chapterLink[i].InnerText;                 Chapters.Add(new Tuple<string, string, string>(link, name, number));                  checkedListBox1.Items.Add(link);             }         }     } } 

For logging I wrote a logger in a separate class and namespace:

using System; using System.Collections.Generic; using System.IO; using System.Linq; using System.Text; using System.Threading.Tasks; using System.Windows.Forms;  namespace Treasure {     /// <summary>     /// Logger class. Creates per-day logs, saved into "Log" directory     /// </summary>     public class Logger     {         /// <summary>         /// Logs the specified log message         /// </summary>         /// <param name="LogMsg">The message to be logged.</param>         /// <param name="args">Arguments like {1}</param>         public static void Log(string LogMsg, params object[] args)         {             if (!Directory.Exists(Environment.CurrentDirectory + @"\Log"))             {                 try                 {                     Directory.CreateDirectory(Environment.CurrentDirectory + @"\Log");                     using (StreamWriter sw = new StreamWriter(Environment.CurrentDirectory + @"\Log\" + DateTime.Now.ToString("dd-MM-yy") + ".txt", true))                     {                         sw.WriteLine(DateTime.Now.ToLongTimeString() + " " + string.Format(LogMsg, args));                     }                 }                 catch (Exception e)                 { #if DEBUG                     Console.WriteLine(e.Message + e.StackTrace); #endif  #if RELEASE                     MessageBox.Show("Could not create Log directory.", "Error");                     Log(e.Message); #endif                 }             }             else             {                 try                 {                     // Creates per-day log file with current date as file name                     using (StreamWriter sw = new StreamWriter(Environment.CurrentDirectory + @"\Log\" + DateTime.Now.ToString("dd-MM-yy") + ".txt", true))                     {                         sw.WriteLine(DateTime.Now.ToLongTimeString() + " " + string.Format(LogMsg, args));                     }                 }                 catch (Exception e)                 {                     Console.WriteLine(e.Message + e.StackTrace);                     Log(e.Message + "\n" + e.StackTrace);                 }             }         }     } } 
  
       
       

Lista de respuestas

2
 
vote
vote
La mejor respuesta
 

Una parte grande del problema es hacer SpigotPlugin8 en el hilo de la interfaz de usuario. Puede marcar los métodos SpigotPlugin9 y use Main0 en su lugar si desea un sueño que no bloquea.

  Main1  

Pero si diseñé esto, haría toda la carga de trabajo de forma asíncrona. Eso implicaría un poco más de trabajo. Pero podrías empezar con algo simple así:

  1. Deshabilitar parte de la UI.
  2. Main2 A Main3 en toda la carga de trabajo.
  3. volver a habilitar parte de la UI para permitir una segunda carrera.

algo como esto:

  Main4  

Luego, mientras realiza el trabajo de forma asíncrona, puede volver a marcar al hilo UI para actualizar el estado usando Main5 .

No estoy seguro de por qué está creando un millón Main6 instancias. A veces ni siquiera usandolos:

  Main7  

No hay nada detenerle compartir una instancia 99887776655443328 .

 

A large part of the problem is doing Thread.Sleep on the UI thread. You could mark the methods async and use await Task.Delay instead if you want a non-blocking sleep.

public async Task MethodThatRunsOnUIThread() {     //Do stuff     //Wait     await Task.Delay(150);     //Do more stuff } 

But if I designed this I would do the entire workload asynchronously. That would involve a little bit more work. But you could start with something simple like this:

  1. Disable part of the UI.
  2. await a Task.Run on the entire workload.
  3. Re-enable part of the UI to allow a second run.

Something like this:

private async void btnLoad_Click(object sender, EventArgs e) {     btnLoad.Enabled = false;     string url = txtURL.Text;     await Task.Run(() => DoLoad(url));     btnLoad.Enabled = true; } 

Then while performing work asynchronously you can marshal back to the UI thread to update status using BeginInvoke.

I'm not sure why you're creating a million WebClient instances. Sometimes not even using them:

using (WebClient client = new WebClient()) {     string htmlCode = LoadHtmlCode(Uri.AbsoluteUri);     LoadAllChapters(htmlCode);     Download(); } 

There's nothing stopping you from sharing a WebClient instance.

 
 

Relacionados problema

7  Versión LINQ del algoritmo de recocido simulado  ( Linq version of the simulated annealing algorithm ) 
Decidí intentarlo e implementar (una versión de) la recocido simulado algoritmo usando solo linq , solo para ver si pudiera. Me encantaría si alguien pudi...

3  Generador de imágenes de Mandelbrot con iteración paralela  ( Mandelbrot image generator with parallel iteration ) 
Actualmente estoy tratando de optimizar esta clase que tengo para la generación fractal. La ecuación está destinada a ser conectable; He usado z => z*z + c ...

1  TPL HILO FUGA Y FUGA DE MEMORIA  ( Tpl thread leak and memory leak ) 
Estoy tratando de rastrear lo que supongo que debo ser una pérdida de memoria / hilo en uno de mis programas. Mi programa usa una función para cargar un arc...

0  Creación de múltiples objetos del extracto de SQL Server  ( Creating multiple objects from sql server extract ) 
He creado una solución prototipo simplificada como una especie de prueba de concepto antes de comenzar un programa más grande. Aquí están los datos de prueb...

3  Genéricos anulables - implementando secuencialSearchst en C #  ( Nullable generics implementing sequentialsearchst in c ) 
Para fines de aprendizaje, estoy implementando cierto código de SEDEDWICK & AMP; Los algoritmos de Wayne, cuarta edición . Debido a las características del...

3  Mientras que el bucle usa variables adicionales, ¿se puede limpiar esto?  ( While loop uses extra variables can this be cleaned up ) 
Una pieza de mi programa permite que se utilice el bucle y la incremento de una entidad seleccionada en un bucle de tiempo en otro lugar. Aquí hay una muestra...

35  Demasiados bucles en la aplicación de dibujo  ( Too many loops in drawing app ) 
Tengo un método que tiene muchos bucles: #ifndef __RUNES_STRUCTURES_H #define __RUNES_STRUCTURES_H /* Runes structures. */ struct Game { char board[2...

6  PRIGO DE PODER TICTACTOO EN C #  ( Command prompt tictactoe in c ) 
Escribí un juego básico de comando TIC TAC TOE juego. Quiero saber qué se puede mejorar en términos de modelado y qué errores he hecho (si corresponde). vo...

6  ¿Está mi delegado definió la forma correcta y necesita transformarse en una mariposa bonita?  ( Is my delegate defined the right way and does it need to transform to a pretty e ) 
He leído tantos Historias de tiempo de cama e inserciones de cómo < fuertes> delegados trabajo y por qué eventos deben ser reemplazados por los delegado...

7  Colecciones vacías en caché  ( Cached empty collections ) 
A menudo necesito devolver las colecciones vacías. Uno de esos días, escribí lo siguiente para devolver una instancia en caché: public static class Array<...




© 2022 respuesta.top Reservados todos los derechos. Centro de preguntas y respuestas reservados todos los derechos