2024-11-14 00:45:06 +01:00
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
< html >
< head >
< meta http-equiv = "content-type" content = "text/html; charset=utf-8" >
< title > CostruzioneUtensili< / title >
< link rel = "stylesheet" type = "text/css" media = "all" charset = "utf-8" href = "acaro/css/common.css" >
< link rel = "stylesheet" type = "text/css" media = "screen" charset = "utf-8" href = "acaro/css/screen.css" >
< link rel = "stylesheet" type = "text/css" media = "print" charset = "utf-8" href = "acaro/css/print.css" >
< style type = "text/css" >
ul.pagetitle{
display: inline;
margin: 0;
padding: 0;
font-size: 1.5em;
}
li.pagetitle{
display: inline;
margin: 0;
}
td.noborder {
border: 0;
}
< / style >
< / head >
< body >
< table >
< tr >
< td class = "noborder" >
< img src = "logo.png" width = "85" height = "85" >
< / td >
< td class = "noborder" >
< ul class = "pagetitle" >
< li class = "pagetitle" > < a class = "backlink" > CostruzioneUtensili< / a >
< / ul >
< br > < br >
[< a href = "FrontPage.html" > FrontPage< / a > ]
< / td >
< / tr >
< / table >
< hr >
< div id = "page" >
< div dir = "ltr" id = "content" lang = "it" > < span class = "anchor" id = "top" > < / span >
< span class = "anchor" id = "line-1-6" > < / span > < span class = "anchor" id = "line-2" > < / span > < span class = "anchor" id = "line-3" > < / span > < p class = "line867" >
< h1 id = "Costruzione_Utensili" > Costruzione Utensili< / h1 >
< span class = "anchor" id = "line-4" > < / span > < span class = "anchor" id = "line-5" > < / span > < p class = "line874" > La Cultura è la nostra Natura, siamo cacciatori e raccoglitori in un mondo di informazione. < span class = "anchor" id = "line-6" > < / span > < span class = "anchor" id = "line-7" > < / span > < p class = "line867" >
< h2 id = "Prerequisiti" > Prerequisiti< / h2 >
< span class = "anchor" id = "line-8" > < / span > < ul > < li > Un'idea vaga di HTML < span class = "anchor" id = "line-9" > < / span > < / li > < li > Saper scrivere, o anche solo leggere un qualsiasi linguaggio < span class = "anchor" id = "line-10" > < / span > < span class = "anchor" id = "line-11" > < / span > < / li > < / ul > < p class = "line867" >
< h2 id = "Programma" > Programma< / h2 >
< span class = "anchor" id = "line-12" > < / span > < span class = "anchor" id = "line-13" > < / span > < p class = "line874" > Serie di pomeriggi di sperimentazione libera, segue workshop rivolto al pubblico. < span class = "anchor" id = "line-14" > < / span > < span class = "anchor" id = "line-15" > < / span > < p class = "line867" >
< h2 id = "Temi" > Temi< / h2 >
< span class = "anchor" id = "line-16" > < / span > < span class = "anchor" id = "line-17" > < / span > < p class = "line874" > Ancora da definire, ma a grandi linee: < span class = "anchor" id = "line-18" > < / span > < ul > < li > Orientarsi con l'inspector del Browser < span class = "anchor" id = "line-19" > < / span > < / li > < li > Rudimenti di web scraping con Python: < span class = "anchor" id = "line-20" > < / span > < ul > < li > GET e fake-user agent con requests < span class = "anchor" id = "line-21" > < / span > < / li > < li > Beautiful-soup e/o lxml per il parsing delle pagine < span class = "anchor" id = "line-22" > < / span > < / li > < li > Web spider con scrapy < span class = "anchor" id = "line-23" > < / span > < / li > < / ul > < / li > < li > wget e qualcosa di bash? < span class = "anchor" id = "line-24" > < / span > < span class = "anchor" id = "line-25" > < / span > < / li > < / ul > < p class = "line867" >
< h2 id = "Riferimenti_Sparsi" > Riferimenti Sparsi< / h2 >
< span class = "anchor" id = "line-26" > < / span > < ul > < li > < p class = "line891" > < a class = "https" href = "https://elitedatascience.com/python-web-scraping-libraries" > https://elitedatascience.com/python-web-scraping-libraries< / a > < span class = "anchor" id = "line-27" > < / span > < / li > < li > < p class = "line891" > < a class = "https" href = "https://first-web-scraper.readthedocs.io/en/latest/" > https://first-web-scraper.readthedocs.io/en/latest/< / a > < span class = "anchor" id = "line-28" > < / span > < / li > < li > < p class = "line891" > < a class = "https" href = "https://medium.com/@kaismh/extracting-data-from-websites-using-scrapy-e1e1e357651a" > https://medium.com/@kaismh/extracting-data-from-websites-using-scrapy-e1e1e357651a< / a > < span class = "anchor" id = "line-29" > < / span > < / li > < li > < p class = "line891" > < a class = "https" href = "https://deshmukhsuraj.wordpress.com/2015/03/08/anonymous-web-scraping-using-python-and-tor/" > https://deshmukhsuraj.wordpress.com/2015/03/08/anonymous-web-scraping-using-python-and-tor/< / a > < span class = "anchor" id = "line-30" > < / span > < span class = "anchor" id = "line-31" > < / span > < / li > < / ul > < p class = "line867" >
< h2 id = "Terminale" > Terminale< / h2 >
< span class = "anchor" id = "line-32" > < / span > < span class = "anchor" id = "line-33" > < / span > < p class = "line867" >
< h3 id = "curl" > curl< / h3 >
< span class = "anchor" id = "line-34" > < / span > < p class = "line867" > < span class = "anchor" id = "line-35" > < / span > < span class = "anchor" id = "line-36" > < / span > < pre > < span class = "anchor" id = "line-1" > < / span > curl "http://www.example.com"< / pre > < span class = "anchor" id = "line-37" > < / span > < p class = "line874" > esegue una GET e ne stampa l'output < span class = "anchor" id = "line-38" > < / span > < span class = "anchor" id = "line-39" > < / span > < p class = "line867" > < span class = "anchor" id = "line-40" > < / span > < span class = "anchor" id = "line-41" > < / span > < pre > < span class = "anchor" id = "line-1-1" > < / span > curl -o out.html "http://www.example.com"< / pre > < span class = "anchor" id = "line-42" > < / span > < p class = "line862" > ora l'output è salvato sul file < em > out.html< / em > < span class = "anchor" id = "line-43" > < / span > < span class = "anchor" id = "line-44" > < / span > < p class = "line867" >
< h3 id = "wget" > wget< / h3 >
< span class = "anchor" id = "line-45" > < / span > < p class = "line867" > < span class = "anchor" id = "line-46" > < / span > < span class = "anchor" id = "line-47" > < / span > < pre > < span class = "anchor" id = "line-1-2" > < / span > wget "http://www.example.com/index.html"< / pre > < span class = "anchor" id = "line-48" > < / span > < p class = "line862" > salva in contenuto in < em > index.html< / em > < span class = "anchor" id = "line-49" > < / span > < span class = "anchor" id = "line-50" > < / span > < p class = "line867" > < span class = "anchor" id = "line-51" > < / span > < span class = "anchor" id = "line-52" > < / span > < pre > < span class = "anchor" id = "line-1-3" > < / span > wget -r "http://www.example.com/"< / pre > < span class = "anchor" id = "line-53" > < / span > < p class = "line862" > salva < strong > tutto< / strong > il contenuto del sito nella directory corrente < span class = "anchor" id = "line-54" > < / span > < span class = "anchor" id = "line-55" > < / span > < p class = "line867" >
< h3 id = "Python" > Python< / h3 >
< span class = "anchor" id = "line-56" > < / span > < p class = "line867" > < span class = "anchor" id = "line-57" > < / span > < span class = "anchor" id = "line-58" > < / span > < pre > < span class = "anchor" id = "line-1-4" > < / span > python3 script.py< / pre > < span class = "anchor" id = "line-59" > < / span > < p class = "line874" > esegue uno script < span class = "anchor" id = "line-60" > < / span > < span class = "anchor" id = "line-61" > < / span > < p class = "line867" > < span class = "anchor" id = "line-62" > < / span > < span class = "anchor" id = "line-63" > < / span > < pre > < span class = "anchor" id = "line-1-5" > < / span > python3 script.py > out.txt< / pre > < span class = "anchor" id = "line-64" > < / span > < p class = "line862" > esegue uno script e ne salva l'output in < em > out.txt< / em > < span class = "anchor" id = "line-65" > < / span > < span class = "anchor" id = "line-66" > < / span > < p class = "line867" >
< h2 id = "Codice" > Codice< / h2 >
< span class = "anchor" id = "line-67" > < / span > < span class = "anchor" id = "line-68" > < / span > < p class = "line867" >
< h3 id = "Scraping" > Scraping< / h3 >
< span class = "anchor" id = "line-69" > < / span > < p class = "line874" > Stampa l'elenco degli spazi di Macao: < span class = "anchor" id = "line-70" > < / span > < span class = "anchor" id = "line-71" > < / span > < span class = "anchor" id = "line-72" > < / span > < span class = "anchor" id = "line-73" > < / span > < span class = "anchor" id = "line-74" > < / span > < span class = "anchor" id = "line-75" > < / span > < span class = "anchor" id = "line-76" > < / span > < span class = "anchor" id = "line-77" > < / span > < span class = "anchor" id = "line-78" > < / span > < span class = "anchor" id = "line-79" > < / span > < span class = "anchor" id = "line-80" > < / span > < span class = "anchor" id = "line-81" > < / span > < span class = "anchor" id = "line-82" > < / span > < span class = "anchor" id = "line-83" > < / span > < span class = "anchor" id = "line-84" > < / span > < span class = "anchor" id = "line-85" > < / span > < span class = "anchor" id = "line-1-7" > < / span > < div class = "highlight python3" > < div class = "codearea" dir = "ltr" lang = "en" >
< script type = "text/javascript" >
function isnumbered(obj) {
return obj.childNodes.length & & obj.firstChild.childNodes.length & & obj.firstChild.firstChild.className == 'LineNumber';
}
function nformat(num,chrs,add) {
var nlen = Math.max(0,chrs-(''+num).length), res = '';
while (nlen>0) { res += ' '; nlen-- }
return res+num+add;
}
function addnumber(did, nstart, nstep) {
var c = document.getElementById(did), l = c.firstChild, n = 1;
if (!isnumbered(c)) {
if (typeof nstart == 'undefined') nstart = 1;
if (typeof nstep == 'undefined') nstep = 1;
var n = nstart;
while (l != null) {
if (l.tagName == 'SPAN') {
var s = document.createElement('SPAN');
var a = document.createElement('A');
s.className = 'LineNumber';
a.appendChild(document.createTextNode(nformat(n,4,'')));
a.href = '#' + did + '_' + n;
s.appendChild(a);
s.appendChild(document.createTextNode(' '));
n += nstep;
if (l.childNodes.length) {
l.insertBefore(s, l.firstChild);
}
else {
l.appendChild(s);
}
}
l = l.nextSibling;
}
}
return false;
}
function remnumber(did) {
var c = document.getElementById(did), l = c.firstChild;
if (isnumbered(c)) {
while (l != null) {
if (l.tagName == 'SPAN' & & l.firstChild.className == 'LineNumber') l.removeChild(l.firstChild);
l = l.nextSibling;
}
}
return false;
}
function togglenumber(did, nstart, nstep) {
var c = document.getElementById(did);
if (isnumbered(c)) {
remnumber(did);
} else {
addnumber(did,nstart,nstep);
}
return false;
}
< / script >
< script type = "text/javascript" >
document.write('< a href = "#" onclick = "return togglenumber(\'CA-9111a916e892a1f257425d0cede6cf32f9811e1c\', 1, 1);" \
class="codenumbers">Toggle line numbers< \/a>');
< / script >
< pre dir = "ltr" id = "CA-9111a916e892a1f257425d0cede6cf32f9811e1c" lang = "en" > < span class = "line" > < span class = "LineNumber" > < a href = "#CA-9111a916e892a1f257425d0cede6cf32f9811e1c_1" > 1< / a > < / span > < span class = "LineAnchor" id = "CA-9111a916e892a1f257425d0cede6cf32f9811e1c_1" > < / span > < span class = "anchor" id = "line-1-8" > < / span > < span class = "Comment" > #!/usr/bin/env python3< / span > < / span >
< span class = "line" > < span class = "LineNumber" > < a href = "#CA-9111a916e892a1f257425d0cede6cf32f9811e1c_2" > 2< / a > < / span > < span class = "LineAnchor" id = "CA-9111a916e892a1f257425d0cede6cf32f9811e1c_2" > < / span > < span class = "anchor" id = "line-2-1" > < / span > < span class = "ResWord" > import< / span > < span class = "ID" > requests< / span > < / span >
< span class = "line" > < span class = "LineNumber" > < a href = "#CA-9111a916e892a1f257425d0cede6cf32f9811e1c_3" > 3< / a > < / span > < span class = "LineAnchor" id = "CA-9111a916e892a1f257425d0cede6cf32f9811e1c_3" > < / span > < span class = "anchor" id = "line-3-1" > < / span > < span class = "ResWord" > from< / span > < span class = "ID" > bs4< / span > < span class = "ResWord" > import< / span > < span class = "ID" > BeautifulSoup< / span > < / span >
< span class = "line" > < span class = "LineNumber" > < a href = "#CA-9111a916e892a1f257425d0cede6cf32f9811e1c_4" > 4< / a > < / span > < span class = "LineAnchor" id = "CA-9111a916e892a1f257425d0cede6cf32f9811e1c_4" > < / span > < span class = "anchor" id = "line-4-1" > < / span > < / span >
< span class = "line" > < span class = "LineNumber" > < a href = "#CA-9111a916e892a1f257425d0cede6cf32f9811e1c_5" > 5< / a > < / span > < span class = "LineAnchor" id = "CA-9111a916e892a1f257425d0cede6cf32f9811e1c_5" > < / span > < span class = "anchor" id = "line-5-1" > < / span > < span class = "ID" > url< / span > = < span class = "String" > "< / span > < span class = "String" > http://www.macaomilano.org/spip.php?rubrique18< / span > < span class = "String" > "< / span > < / span >
< span class = "line" > < span class = "LineNumber" > < a href = "#CA-9111a916e892a1f257425d0cede6cf32f9811e1c_6" > 6< / a > < / span > < span class = "LineAnchor" id = "CA-9111a916e892a1f257425d0cede6cf32f9811e1c_6" > < / span > < span class = "anchor" id = "line-6-1" > < / span > < span class = "ID" > r< / span > = < span class = "ID" > requests< / span > .< span class = "ID" > get< / span > (< span class = "ID" > url< / span > )< / span >
< span class = "line" > < span class = "LineNumber" > < a href = "#CA-9111a916e892a1f257425d0cede6cf32f9811e1c_7" > 7< / a > < / span > < span class = "LineAnchor" id = "CA-9111a916e892a1f257425d0cede6cf32f9811e1c_7" > < / span > < span class = "anchor" id = "line-7-1" > < / span > < span class = "ID" > page< / span > = < span class = "ID" > r< / span > .< span class = "ID" > text< / span > < / span >
< span class = "line" > < span class = "LineNumber" > < a href = "#CA-9111a916e892a1f257425d0cede6cf32f9811e1c_8" > 8< / a > < / span > < span class = "LineAnchor" id = "CA-9111a916e892a1f257425d0cede6cf32f9811e1c_8" > < / span > < span class = "anchor" id = "line-8-1" > < / span > < / span >
< span class = "line" > < span class = "LineNumber" > < a href = "#CA-9111a916e892a1f257425d0cede6cf32f9811e1c_9" > 9< / a > < / span > < span class = "LineAnchor" id = "CA-9111a916e892a1f257425d0cede6cf32f9811e1c_9" > < / span > < span class = "anchor" id = "line-9-1" > < / span > < span class = "ID" > soup< / span > = < span class = "ID" > BeautifulSoup< / span > (< span class = "ID" > page< / span > , < span class = "String" > "< / span > < span class = "String" > html.parser< / span > < span class = "String" > "< / span > )< / span >
< span class = "line" > < span class = "LineNumber" > < a href = "#CA-9111a916e892a1f257425d0cede6cf32f9811e1c_10" > 10< / a > < / span > < span class = "LineAnchor" id = "CA-9111a916e892a1f257425d0cede6cf32f9811e1c_10" > < / span > < span class = "anchor" id = "line-10-1" > < / span > < / span >
< span class = "line" > < span class = "LineNumber" > < a href = "#CA-9111a916e892a1f257425d0cede6cf32f9811e1c_11" > 11< / a > < / span > < span class = "LineAnchor" id = "CA-9111a916e892a1f257425d0cede6cf32f9811e1c_11" > < / span > < span class = "anchor" id = "line-11-1" > < / span > < span class = "ID" > h2s< / span > = < span class = "ID" > soup< / span > .< span class = "ID" > findAll< / span > (< span class = "String" > "< / span > < span class = "String" > h2< / span > < span class = "String" > "< / span > )< / span >
< span class = "line" > < span class = "LineNumber" > < a href = "#CA-9111a916e892a1f257425d0cede6cf32f9811e1c_12" > 12< / a > < / span > < span class = "LineAnchor" id = "CA-9111a916e892a1f257425d0cede6cf32f9811e1c_12" > < / span > < span class = "anchor" id = "line-12-1" > < / span > < span class = "ID" > spazi< / span > = [< span class = "ID" > h2< / span > .< span class = "ID" > text< / span > < span class = "ResWord" > for< / span > < span class = "ID" > h2< / span > < span class = "ResWord" > in< / span > < span class = "ID" > h2s< / span > ]< / span >
< span class = "line" > < span class = "LineNumber" > < a href = "#CA-9111a916e892a1f257425d0cede6cf32f9811e1c_13" > 13< / a > < / span > < span class = "LineAnchor" id = "CA-9111a916e892a1f257425d0cede6cf32f9811e1c_13" > < / span > < span class = "anchor" id = "line-13-1" > < / span > < / span >
< span class = "line" > < span class = "LineNumber" > < a href = "#CA-9111a916e892a1f257425d0cede6cf32f9811e1c_14" > 14< / a > < / span > < span class = "LineAnchor" id = "CA-9111a916e892a1f257425d0cede6cf32f9811e1c_14" > < / span > < span class = "anchor" id = "line-14-1" > < / span > < span class = "ResWord" > print< / span > (< span class = "String" > "< / span > < span class = "SPChar" > \n< / span > < span class = "String" > "< / span > .< span class = "ID" > join< / span > (< span class = "ID" > spazi< / span > ))< / span >
< / pre > < / div > < / div > < span class = "anchor" id = "line-86" > < / span > < span class = "anchor" id = "line-87" > < / span > < p class = "line862" > Stesso raccolto, ma con XPath < img alt = ":)" height = "16" src = "/moin_static1911/acaro/img/smile.png" title = ":)" width = "16" / > < span class = "anchor" id = "line-88" > < / span > < span class = "anchor" id = "line-89" > < / span > < span class = "anchor" id = "line-90" > < / span > < span class = "anchor" id = "line-91" > < / span > < span class = "anchor" id = "line-92" > < / span > < span class = "anchor" id = "line-93" > < / span > < span class = "anchor" id = "line-94" > < / span > < span class = "anchor" id = "line-95" > < / span > < span class = "anchor" id = "line-96" > < / span > < span class = "anchor" id = "line-97" > < / span > < span class = "anchor" id = "line-98" > < / span > < span class = "anchor" id = "line-99" > < / span > < span class = "anchor" id = "line-100" > < / span > < span class = "anchor" id = "line-101" > < / span > < span class = "anchor" id = "line-102" > < / span > < span class = "anchor" id = "line-103" > < / span > < span class = "anchor" id = "line-104" > < / span > < span class = "anchor" id = "line-105" > < / span > < span class = "anchor" id = "line-106" > < / span > < span class = "anchor" id = "line-1-9" > < / span > < div class = "highlight python3" > < div class = "codearea" dir = "ltr" lang = "en" >
< script type = "text/javascript" >
document.write('< a href = "#" onclick = "return togglenumber(\'CA-23af50fad8100d6e8fbe6a70bdb68ba01bf3c8f9\', 1, 1);" \
class="codenumbers">Toggle line numbers< \/a>');
< / script >
< pre dir = "ltr" id = "CA-23af50fad8100d6e8fbe6a70bdb68ba01bf3c8f9" lang = "en" > < span class = "line" > < span class = "LineNumber" > < a href = "#CA-23af50fad8100d6e8fbe6a70bdb68ba01bf3c8f9_1" > 1< / a > < / span > < span class = "LineAnchor" id = "CA-23af50fad8100d6e8fbe6a70bdb68ba01bf3c8f9_1" > < / span > < span class = "anchor" id = "line-1-10" > < / span > < span class = "Comment" > #!/usr/bin/env python< / span > < / span >
< span class = "line" > < span class = "LineNumber" > < a href = "#CA-23af50fad8100d6e8fbe6a70bdb68ba01bf3c8f9_2" > 2< / a > < / span > < span class = "LineAnchor" id = "CA-23af50fad8100d6e8fbe6a70bdb68ba01bf3c8f9_2" > < / span > < span class = "anchor" id = "line-2-2" > < / span > < span class = "ResWord" > from< / span > < span class = "ID" > lxml< / span > < span class = "ResWord" > import< / span > < span class = "ID" > html< / span > < / span >
< span class = "line" > < span class = "LineNumber" > < a href = "#CA-23af50fad8100d6e8fbe6a70bdb68ba01bf3c8f9_3" > 3< / a > < / span > < span class = "LineAnchor" id = "CA-23af50fad8100d6e8fbe6a70bdb68ba01bf3c8f9_3" > < / span > < span class = "anchor" id = "line-3-2" > < / span > < span class = "ResWord" > from< / span > < span class = "ID" > io< / span > < span class = "ResWord" > import< / span > < span class = "ID" > BytesIO< / span > < / span >
< span class = "line" > < span class = "LineNumber" > < a href = "#CA-23af50fad8100d6e8fbe6a70bdb68ba01bf3c8f9_4" > 4< / a > < / span > < span class = "LineAnchor" id = "CA-23af50fad8100d6e8fbe6a70bdb68ba01bf3c8f9_4" > < / span > < span class = "anchor" id = "line-4-2" > < / span > < span class = "ResWord" > import< / span > < span class = "ID" > requests< / span > < span class = "ResWord" > as< / span > < span class = "ID" > reqs< / span > < / span >
< span class = "line" > < span class = "LineNumber" > < a href = "#CA-23af50fad8100d6e8fbe6a70bdb68ba01bf3c8f9_5" > 5< / a > < / span > < span class = "LineAnchor" id = "CA-23af50fad8100d6e8fbe6a70bdb68ba01bf3c8f9_5" > < / span > < span class = "anchor" id = "line-5-2" > < / span > < / span >
< span class = "line" > < span class = "LineNumber" > < a href = "#CA-23af50fad8100d6e8fbe6a70bdb68ba01bf3c8f9_6" > 6< / a > < / span > < span class = "LineAnchor" id = "CA-23af50fad8100d6e8fbe6a70bdb68ba01bf3c8f9_6" > < / span > < span class = "anchor" id = "line-6-2" > < / span > < / span >
< span class = "line" > < span class = "LineNumber" > < a href = "#CA-23af50fad8100d6e8fbe6a70bdb68ba01bf3c8f9_7" > 7< / a > < / span > < span class = "LineAnchor" id = "CA-23af50fad8100d6e8fbe6a70bdb68ba01bf3c8f9_7" > < / span > < span class = "anchor" id = "line-7-2" > < / span > < span class = "Comment" > # FF XPath Plugin:< / span > < / span >
< span class = "line" > < span class = "LineNumber" > < a href = "#CA-23af50fad8100d6e8fbe6a70bdb68ba01bf3c8f9_8" > 8< / a > < / span > < span class = "LineAnchor" id = "CA-23af50fad8100d6e8fbe6a70bdb68ba01bf3c8f9_8" > < / span > < span class = "anchor" id = "line-8-2" > < / span > < span class = "Comment" > # https://addons.mozilla.org/en-US/firefox/addon/xpath-checker/< / span > < / span >
< span class = "line" > < span class = "LineNumber" > < a href = "#CA-23af50fad8100d6e8fbe6a70bdb68ba01bf3c8f9_9" > 9< / a > < / span > < span class = "LineAnchor" id = "CA-23af50fad8100d6e8fbe6a70bdb68ba01bf3c8f9_9" > < / span > < span class = "anchor" id = "line-9-2" > < / span > < / span >
< span class = "line" > < span class = "LineNumber" > < a href = "#CA-23af50fad8100d6e8fbe6a70bdb68ba01bf3c8f9_10" > 10< / a > < / span > < span class = "LineAnchor" id = "CA-23af50fad8100d6e8fbe6a70bdb68ba01bf3c8f9_10" > < / span > < span class = "anchor" id = "line-10-2" > < / span > < / span >
< span class = "line" > < span class = "LineNumber" > < a href = "#CA-23af50fad8100d6e8fbe6a70bdb68ba01bf3c8f9_11" > 11< / a > < / span > < span class = "LineAnchor" id = "CA-23af50fad8100d6e8fbe6a70bdb68ba01bf3c8f9_11" > < / span > < span class = "anchor" id = "line-11-2" > < / span > < span class = "ID" > url< / span > = < span class = "String" > "< / span > < span class = "String" > http://macaomilano.org/spip.php?rubrique18< / span > < span class = "String" > "< / span > < / span >
< span class = "line" > < span class = "LineNumber" > < a href = "#CA-23af50fad8100d6e8fbe6a70bdb68ba01bf3c8f9_12" > 12< / a > < / span > < span class = "LineAnchor" id = "CA-23af50fad8100d6e8fbe6a70bdb68ba01bf3c8f9_12" > < / span > < span class = "anchor" id = "line-12-2" > < / span > < span class = "ID" > r< / span > = < span class = "ID" > reqs< / span > .< span class = "ID" > get< / span > (< span class = "ID" > url< / span > )< / span >
< span class = "line" > < span class = "LineNumber" > < a href = "#CA-23af50fad8100d6e8fbe6a70bdb68ba01bf3c8f9_13" > 13< / a > < / span > < span class = "LineAnchor" id = "CA-23af50fad8100d6e8fbe6a70bdb68ba01bf3c8f9_13" > < / span > < span class = "anchor" id = "line-13-2" > < / span > < span class = "ID" > doc< / span > = < span class = "ID" > html< / span > .< span class = "ID" > parse< / span > (< span class = "ID" > BytesIO< / span > (< span class = "ID" > r< / span > .< span class = "ID" > content< / span > ))< / span >
< span class = "line" > < span class = "LineNumber" > < a href = "#CA-23af50fad8100d6e8fbe6a70bdb68ba01bf3c8f9_14" > 14< / a > < / span > < span class = "LineAnchor" id = "CA-23af50fad8100d6e8fbe6a70bdb68ba01bf3c8f9_14" > < / span > < span class = "anchor" id = "line-14-2" > < / span > < span class = "ID" > titles< / span > = < span class = "ID" > doc< / span > .< span class = "ID" > xpath< / span > (< span class = "String" > "< / span > < span class = "String" > id(< / span > < span class = "String" > '< / span > < span class = "String" > container< / span > < span class = "String" > '< / span > < span class = "String" > )/div/section/header/h2/a/span/text()< / span > < span class = "String" > "< / span > )< / span >
< span class = "line" > < span class = "LineNumber" > < a href = "#CA-23af50fad8100d6e8fbe6a70bdb68ba01bf3c8f9_15" > 15< / a > < / span > < span class = "LineAnchor" id = "CA-23af50fad8100d6e8fbe6a70bdb68ba01bf3c8f9_15" > < / span > < span class = "anchor" id = "line-15-1" > < / span > < span class = "ResWord" > for< / span > < span class = "ID" > t< / span > < span class = "ResWord" > in< / span > < span class = "ID" > titles< / span > :< / span >
< span class = "line" > < span class = "LineNumber" > < a href = "#CA-23af50fad8100d6e8fbe6a70bdb68ba01bf3c8f9_16" > 16< / a > < / span > < span class = "LineAnchor" id = "CA-23af50fad8100d6e8fbe6a70bdb68ba01bf3c8f9_16" > < / span > < span class = "anchor" id = "line-16-1" > < / span > < span class = "ResWord" > print< / span > (< span class = "ID" > t< / span > )< / span >
< / pre > < / div > < / div > < span class = "anchor" id = "line-107" > < / span > < span class = "anchor" id = "line-108" > < / span > < p class = "line867" >
< h3 id = "Fake_user-agent" > Fake user-agent< / h3 >
< span class = "anchor" id = "line-109" > < / span > < p class = "line867" > < span class = "anchor" id = "line-110" > < / span > < span class = "anchor" id = "line-111" > < / span > < span class = "anchor" id = "line-112" > < / span > < span class = "anchor" id = "line-113" > < / span > < span class = "anchor" id = "line-114" > < / span > < span class = "anchor" id = "line-1-11" > < / span > < div class = "highlight python3" > < div class = "codearea" dir = "ltr" lang = "en" >
< script type = "text/javascript" >
document.write('< a href = "#" onclick = "return togglenumber(\'CA-b6788d7319e0d92c151d0f8cbb2a4da4ad1607b4\', 1, 1);" \
class="codenumbers">Toggle line numbers< \/a>');
< / script >
< pre dir = "ltr" id = "CA-b6788d7319e0d92c151d0f8cbb2a4da4ad1607b4" lang = "en" > < span class = "line" > < span class = "LineNumber" > < a href = "#CA-b6788d7319e0d92c151d0f8cbb2a4da4ad1607b4_1" > 1< / a > < / span > < span class = "LineAnchor" id = "CA-b6788d7319e0d92c151d0f8cbb2a4da4ad1607b4_1" > < / span > < span class = "anchor" id = "line-1-12" > < / span > < span class = "ID" > headers< / span > = < span class = "ID" > requests< / span > .< span class = "ID" > utils< / span > .< span class = "ID" > default_headers< / span > ()< / span >
< span class = "line" > < span class = "LineNumber" > < a href = "#CA-b6788d7319e0d92c151d0f8cbb2a4da4ad1607b4_2" > 2< / a > < / span > < span class = "LineAnchor" id = "CA-b6788d7319e0d92c151d0f8cbb2a4da4ad1607b4_2" > < / span > < span class = "anchor" id = "line-2-3" > < / span > < span class = "ID" > headers< / span > .< span class = "ID" > update< / span > ({< span class = "String" > "< / span > < span class = "String" > User-Agent< / span > < span class = "String" > "< / span > : < span class = "String" > "< / span > < span class = "String" > Mozilla/5.0< / span > < span class = "String" > "< / span > })< / span >
< span class = "line" > < span class = "LineNumber" > < a href = "#CA-b6788d7319e0d92c151d0f8cbb2a4da4ad1607b4_3" > 3< / a > < / span > < span class = "LineAnchor" id = "CA-b6788d7319e0d92c151d0f8cbb2a4da4ad1607b4_3" > < / span > < span class = "anchor" id = "line-3-3" > < / span > < / span >
< span class = "line" > < span class = "LineNumber" > < a href = "#CA-b6788d7319e0d92c151d0f8cbb2a4da4ad1607b4_4" > 4< / a > < / span > < span class = "LineAnchor" id = "CA-b6788d7319e0d92c151d0f8cbb2a4da4ad1607b4_4" > < / span > < span class = "anchor" id = "line-4-3" > < / span > < span class = "ID" > r< / span > = < span class = "ID" > requests< / span > .< span class = "ID" > get< / span > (< span class = "ID" > url< / span > , < span class = "ID" > headers< / span > =< span class = "ID" > headers< / span > )< / span >
< / pre > < / div > < / div > < span class = "anchor" id = "line-115" > < / span > < span class = "anchor" id = "line-116" > < / span > < p class = "line867" >
< h3 id = "Getting_nasty" > Getting nasty< / h3 >
< span class = "anchor" id = "line-117" > < / span > < p class = "line874" > Passare per Tor: < span class = "anchor" id = "line-118" > < / span > < span class = "anchor" id = "line-119" > < / span > < span class = "anchor" id = "line-120" > < / span > < span class = "anchor" id = "line-121" > < / span > < span class = "anchor" id = "line-122" > < / span > < span class = "anchor" id = "line-123" > < / span > < span class = "anchor" id = "line-124" > < / span > < span class = "anchor" id = "line-125" > < / span > < span class = "anchor" id = "line-126" > < / span > < span class = "anchor" id = "line-127" > < / span > < span class = "anchor" id = "line-128" > < / span > < span class = "anchor" id = "line-129" > < / span > < span class = "anchor" id = "line-130" > < / span > < span class = "anchor" id = "line-131" > < / span > < span class = "anchor" id = "line-132" > < / span > < span class = "anchor" id = "line-133" > < / span > < span class = "anchor" id = "line-134" > < / span > < span class = "anchor" id = "line-1-13" > < / span > < div class = "highlight python3" > < div class = "codearea" dir = "ltr" lang = "en" >
< script type = "text/javascript" >
document.write('< a href = "#" onclick = "return togglenumber(\'CA-1ec9fd47065049ca91a17443dce5f33537a9f7fb\', 1, 1);" \
class="codenumbers">Toggle line numbers< \/a>');
< / script >
< pre dir = "ltr" id = "CA-1ec9fd47065049ca91a17443dce5f33537a9f7fb" lang = "en" > < span class = "line" > < span class = "LineNumber" > < a href = "#CA-1ec9fd47065049ca91a17443dce5f33537a9f7fb_1" > 1< / a > < / span > < span class = "LineAnchor" id = "CA-1ec9fd47065049ca91a17443dce5f33537a9f7fb_1" > < / span > < span class = "anchor" id = "line-1-14" > < / span > < span class = "ResWord" > import< / span > < span class = "ID" > socks< / span > < / span >
< span class = "line" > < span class = "LineNumber" > < a href = "#CA-1ec9fd47065049ca91a17443dce5f33537a9f7fb_2" > 2< / a > < / span > < span class = "LineAnchor" id = "CA-1ec9fd47065049ca91a17443dce5f33537a9f7fb_2" > < / span > < span class = "anchor" id = "line-2-4" > < / span > < span class = "ResWord" > import< / span > < span class = "ID" > socket< / span > < / span >
< span class = "line" > < span class = "LineNumber" > < a href = "#CA-1ec9fd47065049ca91a17443dce5f33537a9f7fb_3" > 3< / a > < / span > < span class = "LineAnchor" id = "CA-1ec9fd47065049ca91a17443dce5f33537a9f7fb_3" > < / span > < span class = "anchor" id = "line-3-4" > < / span > < span class = "ResWord" > import< / span > < span class = "ID" > requests< / span > < / span >
< span class = "line" > < span class = "LineNumber" > < a href = "#CA-1ec9fd47065049ca91a17443dce5f33537a9f7fb_4" > 4< / a > < / span > < span class = "LineAnchor" id = "CA-1ec9fd47065049ca91a17443dce5f33537a9f7fb_4" > < / span > < span class = "anchor" id = "line-4-4" > < / span > < / span >
< span class = "line" > < span class = "LineNumber" > < a href = "#CA-1ec9fd47065049ca91a17443dce5f33537a9f7fb_5" > 5< / a > < / span > < span class = "LineAnchor" id = "CA-1ec9fd47065049ca91a17443dce5f33537a9f7fb_5" > < / span > < span class = "anchor" id = "line-5-3" > < / span > < span class = "Comment" > # Prima< / span > < / span >
< span class = "line" > < span class = "LineNumber" > < a href = "#CA-1ec9fd47065049ca91a17443dce5f33537a9f7fb_6" > 6< / a > < / span > < span class = "LineAnchor" id = "CA-1ec9fd47065049ca91a17443dce5f33537a9f7fb_6" > < / span > < span class = "anchor" id = "line-6-3" > < / span > < span class = "ResWord" > print< / span > (< span class = "ID" > requests< / span > .< span class = "ID" > get< / span > (< span class = "String" > "< / span > < span class = "String" > http://icanhazip.com< / span > < span class = "String" > "< / span > ).< span class = "ID" > text< / span > )< / span >
< span class = "line" > < span class = "LineNumber" > < a href = "#CA-1ec9fd47065049ca91a17443dce5f33537a9f7fb_7" > 7< / a > < / span > < span class = "LineAnchor" id = "CA-1ec9fd47065049ca91a17443dce5f33537a9f7fb_7" > < / span > < span class = "anchor" id = "line-7-3" > < / span > < / span >
< span class = "line" > < span class = "LineNumber" > < a href = "#CA-1ec9fd47065049ca91a17443dce5f33537a9f7fb_8" > 8< / a > < / span > < span class = "LineAnchor" id = "CA-1ec9fd47065049ca91a17443dce5f33537a9f7fb_8" > < / span > < span class = "anchor" id = "line-8-3" > < / span > < span class = "ID" > socks< / span > .< span class = "ID" > setdefaultproxy< / span > (< span class = "ID" > proxy_type< / span > =< span class = "ID" > socks< / span > .< span class = "ID" > PROXY_TYPE_SOCKS5< / span > ,< / span >
< span class = "line" > < span class = "LineNumber" > < a href = "#CA-1ec9fd47065049ca91a17443dce5f33537a9f7fb_9" > 9< / a > < / span > < span class = "LineAnchor" id = "CA-1ec9fd47065049ca91a17443dce5f33537a9f7fb_9" > < / span > < span class = "anchor" id = "line-9-3" > < / span > < span class = "ID" > addr< / span > =< span class = "String" > "< / span > < span class = "String" > 127.0.0.1< / span > < span class = "String" > "< / span > ,< / span >
< span class = "line" > < span class = "LineNumber" > < a href = "#CA-1ec9fd47065049ca91a17443dce5f33537a9f7fb_10" > 10< / a > < / span > < span class = "LineAnchor" id = "CA-1ec9fd47065049ca91a17443dce5f33537a9f7fb_10" > < / span > < span class = "anchor" id = "line-10-3" > < / span > < span class = "ID" > port< / span > =< span class = "Number" > 9050< / span > )< / span >
< span class = "line" > < span class = "LineNumber" > < a href = "#CA-1ec9fd47065049ca91a17443dce5f33537a9f7fb_11" > 11< / a > < / span > < span class = "LineAnchor" id = "CA-1ec9fd47065049ca91a17443dce5f33537a9f7fb_11" > < / span > < span class = "anchor" id = "line-11-3" > < / span > < / span >
< span class = "line" > < span class = "LineNumber" > < a href = "#CA-1ec9fd47065049ca91a17443dce5f33537a9f7fb_12" > 12< / a > < / span > < span class = "LineAnchor" id = "CA-1ec9fd47065049ca91a17443dce5f33537a9f7fb_12" > < / span > < span class = "anchor" id = "line-12-3" > < / span > < span class = "ID" > socket< / span > .< span class = "ID" > socket< / span > = < span class = "ID" > socks< / span > .< span class = "ID" > socksocket< / span > < / span >
< span class = "line" > < span class = "LineNumber" > < a href = "#CA-1ec9fd47065049ca91a17443dce5f33537a9f7fb_13" > 13< / a > < / span > < span class = "LineAnchor" id = "CA-1ec9fd47065049ca91a17443dce5f33537a9f7fb_13" > < / span > < span class = "anchor" id = "line-13-3" > < / span > < / span >
< span class = "line" > < span class = "LineNumber" > < a href = "#CA-1ec9fd47065049ca91a17443dce5f33537a9f7fb_14" > 14< / a > < / span > < span class = "LineAnchor" id = "CA-1ec9fd47065049ca91a17443dce5f33537a9f7fb_14" > < / span > < span class = "anchor" id = "line-14-3" > < / span > < span class = "Comment" > # Dopo< / span > < / span >
< span class = "line" > < span class = "LineNumber" > < a href = "#CA-1ec9fd47065049ca91a17443dce5f33537a9f7fb_15" > 15< / a > < / span > < span class = "LineAnchor" id = "CA-1ec9fd47065049ca91a17443dce5f33537a9f7fb_15" > < / span > < span class = "anchor" id = "line-15-2" > < / span > < span class = "ResWord" > print< / span > (< span class = "ID" > requests< / span > .< span class = "ID" > get< / span > (< span class = "String" > "< / span > < span class = "String" > http://icanhazip.com< / span > < span class = "String" > "< / span > ).< span class = "ID" > text< / span > )< / span >
< / pre > < / div > < / div > < span class = "anchor" id = "line-135" > < / span > < span class = "anchor" id = "bottom" > < / span > < / div >
< / div >
< hr >
2024-11-14 01:49:47 +01:00
Ultimo cambiamento: 18-03-2017
2024-11-14 00:45:06 +01:00
< / body >
< / html >