관리-도구

편집 파일: parser.cpython-36.pyc

� \AE������������������@���s����d�Z�ddlZddlZddlZddlmZ�dgZejd�Zejd�Z	ejd�Z
ejd�Zejd	�Zejd
�Z
ejd�Zejd�Zejd
�Zejdej�Zejd
�Zejd�ZG�dd��dej�ZdS�)zA parser for HTML and XHTML.�����N)�unescape�
HTMLParserz[&<]z
&[a-zA-Z#]z%&([a-zA-Z][-.a-zA-Z0-9]*)[^a-zA-Z0-9]z)&#(?:[0-9]+|[xX][0-9a-fA-F]+)[^0-9a-fA-F]z	<[a-zA-Z]�>z--\s*>z+([a-zA-Z][^\t\n\r\f />\x00]*)(?:\s|/(?!>))*z]((?<=[\'"\s/])[^\s/>][^\s/=>]*)(\s*=+\s*(\'[^\']*\'|"[^"]*"|(?![\'"])[^>\s]*))?(?:\s|/(?!>))*aF��
  <[a-zA-Z][^\t\n\r\f />\x00]*       # tag name
  (?:[\s/]*                          # optional whitespace before attribute name
    (?:(?<=['"\s/])[^\s/>][^\s/=>]*  # attribute name
      (?:\s*=+\s*                    # value indicator
        (?:'[^']*'                   # LITA-enclosed value
          |"[^"]*"                   # LIT-enclosed value
          |(?!['"])[^>\s]*           # bare value
         )
         (?:\s*,)*                   # possibly followed by a comma
       )?(?:\s|/(?!>))*
     )*
   )?
  \s*                                # trailing whitespace
z#</\s*([a-zA-Z][-.a-zA-Z0-9:_]*)\s*>c���������������@���s����e�Zd�ZdZd:Zdd�dd�Zdd	��Zd
d��Zdd
��ZdZ	dd��Z
dd��Zdd��Zdd��Z
dd��Zd;dd�Zdd��Zdd��Zd d!��Zd"d#��Zd$d%��Zd&d'��Zd(d)��Zd*d+��Zd,d-��Zd.d/��Zd0d1��Zd2d3��Zd4d5��Zd6d7��Zd8d9��ZdS�)<r���aE��Find tags and other markup and call handler functions.

Usage:
        p = HTMLParser()
        p.feed(data)
        ...
        p.close()

Start tags are handled by calling self.handle_starttag() or
    self.handle_startendtag(); end tags by self.handle_endtag().  The
    data between tags is passed from the parser to the derived class
    by calling self.handle_data() with the data as argument (the data
    may be split up in arbitrary chunks).  If convert_charrefs is
    True the character references are converted automatically to the
    corresponding Unicode character (and self.handle_data() is no
    longer split in chunks), otherwise they are passed by calling
    self.handle_entityref() or self.handle_charref() with the string
    containing respectively the named or numeric reference as the
    argument.
    �script�styleT)�convert_charrefsc������������C���s���||�_�|�j���dS�)z�Initialize and reset this instance.

If convert_charrefs is True (the default), all character references
        are automatically converted to the corresponding Unicode characters.
        N)r����reset)�selfr�����r
����#/usr/lib64/python3.6/html/parser.py�__init__W���s����zHTMLParser.__init__c�������������C���s(���d|�_�d|�_t|�_d|�_tjj|���dS�)z1Reset this instance.  Loses all unprocessed data.��z???N)�rawdata�lasttag�interesting_normal�interesting�
cdata_elem�_markupbase�
ParserBaser���)r	���r
���r
���r���r���`���s
����zHTMLParser.resetc�������������C���s���|�j�|�|�_�|�jd��dS�)z�Feed data to the parser.

Call this as often as you want, with as little or as much text
        as you want (may include '\n').
        r���N)r����goahead)r	����datar
���r
���r����feedh���s����zHTMLParser.feedc�������������C���s���|�j�d��dS�)zHandle any buffered data.����N)r���)r	���r
���r
���r����closeq���s����zHTMLParser.closeNc�������������C���s���|�j�S�)z)Return full source of start tag: '<...>'.)�_HTMLParser__starttag_text)r	���r
���r
���r����get_starttag_textw���s����zHTMLParser.get_starttag_textc�������������C���s$���|j���|�_tjd|�j�tj�|�_d�S�)Nz</\s*%s\s*>)�lowerr����re�compile�Ir���)r	����elemr
���r
���r����set_cdata_mode{���s����
zHTMLParser.set_cdata_modec�������������C���s���t�|�_d�|�_d�S�)N)r���r���r���)r	���r
���r
���r����clear_cdata_mode���s����zHTMLParser.clear_cdata_modec�������������C���sZ��|�j�}d}t|�}�x�||k��r�|�jr||�j�r||jd|�}|dk�r�|jdt||d���}|dkrvtjd�j	||��rvP�|}n(|�j
j	||�}|r�|j��}n|�jr�P�|}||k�r�|�jr�|�j�r�|�jt
|||�����n|�j|||����|�j||�}||kr�P�|j}|d|��rLtj||��r&|�j|�}	n�|d|��r>|�j|�}	nl|d|��rV|�j|�}	nT|d|��rn|�j|�}	n<|d	|��r�|�j|�}	n$|d
�|k��r�|�jd��|d
�}	nP�|	dk��r>|�s�P�|jd|d
��}	|	dk��r�|jd|d
��}	|	dk��r|d
�}	n|	d
7�}	|�j�r,|�j��r,|�jt
|||	�����n|�j|||	����|�j||	�}q|d|��r�tj||�}|�r�|j��d
d��}
|�j|
��|j��}	|d|	d
���s�|	d
�}	|�j||	�}qn:d||d���k�r�|�j|||d
�����|�j||d
��}P�q|d|��r�tj||�}|�rP|jd
�}
|�j|
��|j��}	|d|	d
���sB|	d
�}	|�j||	�}qtj||�}|�r�|�r�|j��||d���k�r�|j��}	|	|k�r�|}	|�j||d
��}P�n,|d
�|k��r�|�jd��|�j||d
��}nP�qdstd��qW�|�rH||k��rH|�j��rH|�j�r*|�j��r*|�jt
|||�����n|�j|||����|�j||�}||d���|�_�d�S�)Nr����<�&�"���z[\s;]z</z<!--z<?z<!r���r���z&#�����;zinteresting.search() lied���)r����lenr���r����find�rfind�maxr���r����searchr����start�handle_datar���Z	updatepos�
startswith�starttagopen�match�parse_starttag�parse_endtag�
parse_comment�parse_pi�parse_html_declaration�charref�group�handle_charref�end�	entityref�handle_entityref�
incomplete�AssertionError)r	���r;���r����i�n�jZampposr2���r0����k�namer
���r
���r���r�������s�����
�

zHTMLParser.goaheadc�������������C���s����|�j�}|||d���dks"td��|||d���dkr@|�j|�S�|||d���dkr^|�j|�S�|||d���j��d	kr�|jd
|d��}|dkr�d
S�|�j||d�|����|d�S�|�j|�S�d�S�)Nr&���z<!z+unexpected call to parse_html_declaration()����z<!--����z<![�	���z	<!doctyper���r���r(���r(���)r���r?���r5���Zparse_marked_sectionr���r*����handle_decl�parse_bogus_comment)r	���r@���r����gtposr
���r
���r���r7������s����

z!HTMLParser.parse_html_declarationr���c�������������C���s`���|�j�}|||d���dks"td��|jd|d��}|dkr>d	S�|rX|�j||d�|����|d�S�)
Nr&����<!�</z"unexpected call to parse_comment()r���r���)rK���rL���r(���r(���)r���r?���r*����handle_comment)r	���r@���Zreportr����posr
���r
���r���rI�����s����zHTMLParser.parse_bogus_commentc�������������C���sd���|�j�}|||d���dks"td��tj||d��}|s:dS�|j��}|�j||d�|����|j��}|S�)Nr&���z<?zunexpected call to parse_pi()r���r(���)r���r?����picloser-���r.����	handle_pir;���)r	���r@���r���r2���rB���r
���r
���r���r6���!��s����zHTMLParser.parse_pic�������������C���s���d�|�_�|�j|�}|dk�r|S�|�j}|||��|�_�g�}tj||d��}|sPtd��|j��}|jd�j���|�_	}x�||k��r0t
j||�}|s�P�|jddd�\}	}
}|
s�d�}n^|d�d��d��ko�|dd���kn��p�|d�d��d��ko�|dd���kn���r|dd
��}|�rt|�}|j|	j��|f��|j��}qnW�|||��j
��}|dk�r�|�j��\}
}d
|�j�k�r�|
|�j�jd
��}
t|�j��|�j�jd
��}n|t|�j���}|�j|||����|S�|jd	��r�|�j||��n"|�j||��||�jk�r�|�j|��|S�)Nr���r���z#unexpected call to parse_starttag()r&���rF����'�"r����/>�
r(���r(���r(���)r���rS���)r����check_for_whole_start_tagr����tagfind_tolerantr2���r?���r;���r9���r���r����attrfind_tolerantr����append�stripZgetpos�countr)���r+���r/����endswith�handle_startendtag�handle_starttag�CDATA_CONTENT_ELEMENTSr!���)r	���r@����endposr����attrsr2���rC����tag�mZattrname�restZ	attrvaluer;����lineno�offsetr
���r
���r���r3���-��sR����
(*

zHTMLParser.parse_starttagc�������������C���s����|�j�}tj||�}|r�|j��}|||d���}|dkr>|d�S�|dkr~|jd|�rZ|d�S�|jd|�rjd	S�||krv|S�|d�S�|dkr�d
S�|dkr�dS�||kr�|S�|d�S�td��d�S�)Nr���r����/z/>r&���r
���z6abcdefghijklmnopqrstuvwxyz=/ABCDEFGHIJKLMNOPQRSTUVWXYZzwe should not get here!r(���r(���r(���)r����locatestarttagend_tolerantr2���r;���r0���r?���)r	���r@���r���rb���rB����nextr
���r
���r���rU���`��s.����z$HTMLParser.check_for_whole_start_tagc�������������C���s2��|�j�}|||d���dks"td��tj||d��}|s:dS�|j��}tj||�}|s�|�jd�k	rr|�j|||����|S�t	j||d��}|s�|||d���dkr�|d�S�|�j
|�S�|jd�j��}|j
d|j���}|�j|��|d�S�|jd�j��}|�jd�k	�r||�jk�r|�j|||����|S�|�j|j����|�j���|S�)	Nr&���z</zunexpected call to parse_endtagr���rF���z</>r���r(���)r���r?����	endendtagr-���r;����
endtagfindr2���r���r/���rV���rI���r9���r���r*����
handle_endtagr"���)r	���r@���r���r2���rJ���Z	namematchZtagnamer ���r
���r
���r���r4������s8����

zHTMLParser.parse_endtagc�������������C���s���|�j�||��|�j|��d�S�)N)r]���rk���)r	���ra���r`���r
���r
���r���r\������s����zHTMLParser.handle_startendtagc�������������C���s���d�S�)Nr
���)r	���ra���r`���r
���r
���r���r]������s����zHTMLParser.handle_starttagc�������������C���s���d�S�)Nr
���)r	���ra���r
���r
���r���rk������s����zHTMLParser.handle_endtagc�������������C���s���d�S�)Nr
���)r	���rD���r
���r
���r���r:������s����zHTMLParser.handle_charrefc�������������C���s���d�S�)Nr
���)r	���rD���r
���r
���r���r=������s����zHTMLParser.handle_entityrefc�������������C���s���d�S�)Nr
���)r	���r���r
���r
���r���r/������s����zHTMLParser.handle_datac�������������C���s���d�S�)Nr
���)r	���r���r
���r
���r���rM������s����zHTMLParser.handle_commentc�������������C���s���d�S�)Nr
���)r	���Zdeclr
���r
���r���rH������s����zHTMLParser.handle_declc�������������C���s���d�S�)Nr
���)r	���r���r
���r
���r���rP������s����zHTMLParser.handle_pic�������������C���s���d�S�)Nr
���)r	���r���r
���r
���r����unknown_decl���s����zHTMLParser.unknown_declc�������������C���s���t�jdtdd��t|�S�)NzZThe unescape method is deprecated and will be removed in 3.5, use html.unescape() instead.r&���)�
stacklevel)�warnings�warn�DeprecationWarningr���)r	����sr
���r
���r���r������s����
zHTMLParser.unescape)r���r���)r���)�__name__�
__module__�__qualname__�__doc__r^���r���r���r���r���r���r���r!���r"���r���r7���rI���r6���r3���rU���r4���r\���r]���rk���r:���r=���r/���rM���rH���rP���rl���r���r
���r
���r
���r���r���?���s8���		z
3"()ru���r���rn���r���Zhtmlr����__all__r���r���r>���r<���r8���r1���rO���ZcommentcloserV���rW����VERBOSErg���ri���rj���r���r���r
���r
���r
���r����<module>���s(���