Website Copying/Dumping
Website Copying/Dumping
Hello,
I want to know is there any software in linux to copy or dump websites.
I will going to use in Fedora Core 2.
Please mention the link to download the software.
Waiting for a quick reply.
I want to know is there any software in linux to copy or dump websites.
I will going to use in Fedora Core 2.
Please mention the link to download the software.
Waiting for a quick reply.
01101101 01100001 01101010 01101001 01100100
-
- Site Admin
- Posts: 5132
- Joined: Fri May 02, 2003 10:24 am
- Location: Karachi
- Contact:
My friend i am doing perfect legal activitiy actually i am trying to copy one the lectures in HTML format of some course the website allows to read it online or copy 1/1 page using save as or whatever you like it is time consuming so i decided to copy it using some Software.
01101101 01100001 01101010 01101001 01100100
-
- Site Admin
- Posts: 5132
- Joined: Fri May 02, 2003 10:24 am
- Location: Karachi
- Contact:
Re:
Dear MAJID,
Salam,
No Problem, take a look at this http://www.linuxpakistan.net/forum2x/vi ... 3413#17360
Best Regards.
Salam,
No Problem, take a look at this http://www.linuxpakistan.net/forum2x/vi ... 3413#17360
Best Regards.
Farrukh Ahmed
Re:
Dear All,LinuxFreaK wrote:Dear MAJID,
Salam,
No Problem, take a look at this http://www.linuxpakistan.net/forum2x/vi ... 3413#17360
Best Regards.
I tried WebHTTrack Copier.
[root@imrant root]# rpm -q httrack
httrack-3.32.03-FC2
It placed two icons in the Internet tab
that are
Internet > Browse Mirrored Websites
and
Internet > WebHTTrack Website Copier
But the problem is when i open the last one it opens Mozilla Browser with
the url http://imrant.server1:8080/
preinserted then i tried to copy the intented website that i want to copy
that is
http://www.cs.sfu.ca/CourseCentral/365/li/index.html
it simply opens it no option for copying PLEASE HELP
also i tried wget -m but not succeed
[root@imrant root]# wget -m http://www.cs.sfu.ca/CourseCentral/365/li/index.html --19:11:13-- http://www.cs.sfu.ca/CourseCentral/365/li/index.html
=> `www.cs.sfu.ca/CourseCentral/365/li/index.html'
Resolving www.cs.sfu.ca... 142.58.111.29
Connecting to www.cs.sfu.ca[142.58.111.29]:80... failed: Connection timed out.
Retrying.
--19:14:34-- http://www.cs.sfu.ca/CourseCentral/365/li/index.html
(try: 2) => `www.cs.sfu.ca/CourseCentral/365/li/index.html'
Connecting to www.cs.sfu.ca[142.58.111.29]:80...
PLEASE HELP
01101101 01100001 01101010 01101001 01100100
-
- Site Admin
- Posts: 5132
- Joined: Fri May 02, 2003 10:24 am
- Location: Karachi
- Contact:
-
- Major General
- Posts: 1024
- Joined: Thu Jul 04, 2002 5:31 pm
- Location: Karachi/Pakistan/Earth/Universe
This is not a case of bad connection.
Majid, As u told me on msn,THe problem is that ur cable wala is using ntlm authentication.No problem...here is how to make it work.
In an earlier post i helped u set up the ntlmaps proxy which converted ntlm to basic.
now, All u need to do to get wget and others working is to use that ntlmaps proxy...to do that:
1.Open console /terminal
issue this command:
export http_proxy = http://username:password@127.0.0.1:5865/
make sure u replace user name and pass with ur user name and pass
2.use wget with this parameter:
--proxy=on
wget -m --proxy=on http://www.cs.sfu.ca/CourseCentral/365/li/index.html
specify other parameters such as the web address etc.
wget fetches proxy location from the environment variable
make sure u do this while the ntlmaps .py script is running and the proxy is active.(./main.py runs the script an activates the aps proxy)
it should work 1000 % if u follow the steps correctly.
Majid, As u told me on msn,THe problem is that ur cable wala is using ntlm authentication.No problem...here is how to make it work.
In an earlier post i helped u set up the ntlmaps proxy which converted ntlm to basic.
now, All u need to do to get wget and others working is to use that ntlmaps proxy...to do that:
1.Open console /terminal
issue this command:
export http_proxy = http://username:password@127.0.0.1:5865/
make sure u replace user name and pass with ur user name and pass
2.use wget with this parameter:
--proxy=on
wget -m --proxy=on http://www.cs.sfu.ca/CourseCentral/365/li/index.html
specify other parameters such as the web address etc.
wget fetches proxy location from the environment variable
make sure u do this while the ntlmaps .py script is running and the proxy is active.(./main.py runs the script an activates the aps proxy)
it should work 1000 % if u follow the steps correctly.
Linux for Life!
well Httrack is easy to configure, once you know what you are going to do .. its very flexible, only you will have some problem while applying afilter for a particular website.. i dunno if wget has that option of filtering, but if you want only html files, for example, and want to exclude everything else , you can do that in Httrack... I love its flexibility..
its my favorite for both win and Linux..
for further info please follow the links..
Httrack FAQs
Httrack Documentation
Httrack Forum
and finally
How To Use Httrack
its my favorite for both win and Linux..
for further info please follow the links..
Httrack FAQs
Httrack Documentation
Httrack Forum
and finally
How To Use Httrack
When EveryThing Is Meant To Be Broken , I Just Want To Know Who I Am
THANKS for Help DR SAHAB but would you please like to tell me how to invoke this WEB HTT TRACK software i have installed is succesfully but when i click on the icon it open MOZILLA browser i do not know from where i can get the FANCY gui where i create NEW PROJECT or whatsoever that is written in the TUTORIAL you provided. I AM UNABLE TO GET THE WEBHTTRACK GUI only icons are there that are happily invoking MOZILLA BROWSER instead of WEBHTTRACK .It placed two icons in the Internet tab
that are
Internet > Browse Mirrored Websites
and
Internet > WebHTTrack Website Copier
But the problem is when i open the last one it opens Mozilla Browser with
the url http://imrant.server1:8080/
preinserted then i tried to copy the intented website that i want to copy
that is
http://www.cs.sfu.ca/CourseCentral/365/li/index.html
I cannot understand this
[root@imrant root]# webhttrack
/usr/bin/webhttrack(3489): launching /usr/bin/mozilla
Error: No running window found.
/usr/bin/webhttrack(3489): spawning browser..
PLEASE HELP
IS THERE ANY CMD to invoke the GUI of WEBHTTRACK to get rid of MOZILLA brower OPENING REOPENING ???
PLEASE HELP
01101101 01100001 01101010 01101001 01100100
well Majid, ,
I am afraid you havent followed the links totally,
this httrack actually opens in a brwoser window, the fany gui u are talking about is inside a browser window in httrack for linux.
if everything went fine during installation and you faced no trouble , then you can start with the command webhttrack in Run menu,
or you can type webhttrack in any terminal , either su or not...
but make sure you installed it with out any trouble..
i dunno what went wrong,
for the proxy trouble you might face using webhttrack, use the same suggestions as you are with wget .
I am afraid you havent followed the links totally,
this httrack actually opens in a brwoser window, the fany gui u are talking about is inside a browser window in httrack for linux.
if everything went fine during installation and you faced no trouble , then you can start with the command webhttrack in Run menu,
or you can type webhttrack in any terminal , either su or not...
but make sure you installed it with out any trouble..
i dunno what went wrong,
for the proxy trouble you might face using webhttrack, use the same suggestions as you are with wget .
When EveryThing Is Meant To Be Broken , I Just Want To Know Who I Am