Forum PHP.pl > [PHP] Znak nowej linii w PHP Simple HTML DOM

Pomoc - Szukaj - Użytkownicy - Kalendarz

Pełna wersja: [PHP] Znak nowej linii w PHP Simple HTML DOM

mk321

22.07.2011, 21:41:16

Po użyciu PHP Simple HTML DOM nie ma nowych linii na wyjściu.

Dla przykładu normalnie jest:

[PHP] pobierz, plaintext 
<?php
$nazwa_pliku = "wejscie.html";
$plik = fopen($nazwa_pliku, "rt");
$dane = fread($plik, filesize($nazwa_pliku));
fclose($plik);
 
echo $dane;
 
$plik = fopen("wyjscie.html", "wt");
fwrite($plik, $dane);
fclose($plik);
?>
[PHP] pobierz, plaintext

A po użyciu Simple DOM już nie:

[PHP] pobierz, plaintext 
<?php
include('simplehtmldom/simple_html_dom.php');
 
$nazwa_pliku = "wejscie.html";
$plik = fopen($nazwa_pliku, "rt");
$dane = fread($plik, filesize($nazwa_pliku));
fclose($plik);
 
$dane = str_get_html($dane); // w tym miejscu są tracone nowe linie
// jakieś inne operacje w DOM, ale tu nie istotne bo efekt ten sam
echo $dane;
 
$plik = fopen("wyjscie.html", "wt");
fwrite($plik, $dane);
fclose($plik);
?>
[PHP] pobierz, plaintext

Próbowałem inaczej wczytywać: funkcją readfile(), file_get_contents() z Simple DOM i używać innych trybów zapisu, odczytu ("t" i "b"), wyrażeniami regularnymi zamieniać "\r" na "\r\n". Efekt zawsze ten sam. Już nie mam pomysłów...

peter13135

22.07.2011, 21:46:01

nl2br ?

mk321

22.07.2011, 21:53:12

Niestety nie pomaga. Jak użyłem bez DOM-a to normalnie wstawiło znaczniki br. A jak z DOM-em to nic (a to oznacza, że wczytując do DOMa już wszystkie znaki nowej linii są usuwane).

Po za tym mi chodzi o to, że w źródle nie ma nowych linii a nie przy otwieraniu strony htmla.

Dzięki za zainteresowanie

//edit

W kodzie źródłowym simple_html_dom.php znalazłem informację:

Cytat

* all affected sections have comments starting with "PaperG"
*
* Paperg - Added case insensitive testing of the value of the selector.
* Paperg - Added tag_start for the starting index of tags - NOTE: This works but not accurately.
* This tag_start gets counted AFTER \r\n have been crushed out, and after the remove_noice calls so it will not reflect the REAL position of the tag in the source,
* it will almost always be smaller by some amount.
* We use this to determine how far into the file the tag in question is. This "percentage will never be accurate as the $dom->size is the "real" number of bytes the dom was created from.
* but for most purposes, it's a really good estimation.
* Paperg - Added the forceTagsClosed to the dom constructor. Forcing tags closed is great for malformed html, but it CAN lead to parsing errors.
* Allow the user to tell us how much they trust the html.
* Paperg add the text and plaintext to the selectors for the find syntax. plaintext implies text in the innertext of a node. text implies that the tag is a text node.
* This allows for us to find tags based on the text they contain.
* Create find_ancestor_tag to see if a tag is - at any level - inside of another specific tag.
* Paperg: added parse_charset so that we know about the character set of the source document.
* NOTE: If the user's system has a routine called get_last_retrieve_url_contents_content_type availalbe, we will assume it's returning the content-type header from the
* last transfer or curl_exec, and we will parse that and use it in preference to any other method of charset detection.

Czy to znaczy, że po prostu to, że \r\n są usuwane przy wczytywaniu do DOM to normalne i nie robię nic źle?
I jak chce przetwarzać z zostawianiem nowych linii to muszę poszukać sobie innego narzędzia?

//edit
Znalazłem rozwiązanie

Niby jest w Simple DOM do ustawienia $stripRN=false... ale to nic nie daje. Jak wywaliłem z klasy całość co jest odpowiedzialna za wycinanie to już działa

Tyko nie wiem czy będzie to miało jakieś negatywne skutki przy działaniu ;/

//edit
Dobra, działa wszystko jak należy. Problem rozwiązany.

To jest wersja lo-fi głównej zawartości. Aby zobaczyć pełną wersję z większą zawartością, obrazkami i formatowaniem proszę kliknij tutaj.