extract text from xml files
Thread poster: barryw
barryw
barryw
Local time: 11:34
English to Chinese
+ ...
Nov 2, 2010

dear all,
is there any one who knows how to extract text (plain text) from xml files? any hany software?
thank you!


 
Brand Localization
Brand Localization
Egypt
Local time: 06:34
German to Arabic
+ ...
There is a work arround :) Nov 2, 2010

Hi Brrawy,

The following is very useful in most cases:

1- right click the XML file
2- Choose Edit (with Notepad for example)
3- When the file is opened, choose "File" | Save as
4- In the File name field change the file extension into ".html"
5- Save the file (will be saved as a webpage)
6- Open the web page,
you'll find the pure text with the XML tags and more over with format
7- In this way you can copy this text and paste it
... See more
Hi Brrawy,

The following is very useful in most cases:

1- right click the XML file
2- Choose Edit (with Notepad for example)
3- When the file is opened, choose "File" | Save as
4- In the File name field change the file extension into ".html"
5- Save the file (will be saved as a webpage)
6- Open the web page,
you'll find the pure text with the XML tags and more over with format
7- In this way you can copy this text and paste it in a word file for example to be able to deal with the text alone

Best regards

Your Arabic Translation Team
Collapse


Donna Escuin
 
FarkasAndras
FarkasAndras  Identity Verified
Local time: 05:34
English to Hungarian
+ ...
Here's one Nov 2, 2010

barryw wrote:

dear all,
is there any one who knows how to extract text (plain text) from xml files? any hany software?
thank you!



Here's a script of mine:
http://www.mediafire.com/?kq9yayc1hgt2kj9

Unzip, move your file to the tag_stripper folder and rename it to .html. Double click the .bat and follow the instructions. It's a bit crude but it should work... check the results of course, though.
The end result should be pretty much the same as opening the file in a browser and copying the content to a txt, but this solution will work with large files as well, while your browser definitely won't open a 50+ MB file for you.

Also, I have no idea why Arabic Translation Team posted such a convoluted solution. If you want to open the file in your browser, right click it, choose Open with... and pick the browser from the list. No need to change the extension, especially not by opening the file in another program first. If the browser is not on the "open with" list, choose "other program" and pick the browser from there.

The file extension doesn't change the type of the file ("save it as a webpage"). It's just an indication to the OS; it tells the OS what software to open the file with by default. You can easily override that default through the right-click local menu.

[Edited at 2010-11-02 20:48 GMT]


 
FarkasAndras
FarkasAndras  Identity Verified
Local time: 05:34
English to Hungarian
+ ...
OS Nov 2, 2010

Note: the above solution only woks on Windows computers. Barryw failed to specify the OS he uses, so I assume it's some flavour of Windows.

 
barryw
barryw
Local time: 11:34
English to Chinese
+ ...
TOPIC STARTER
Thanks very much for your suggestions. Nov 3, 2010

Dear Arabic Translation Team and FarkasAndras,

Thanks very much for your suggestions.

Arabic Translation Team's solution works well in my case! quite a simple solution.

Thanks FarkasAndras for giving a detailed suggestion, though I still haven't time to try your link, but I believe it will be a good fix for dealing with large size files. Yet, regarding your second suggestion by opening the xml files directly via "Open with>browser" command, it seems it
... See more
Dear Arabic Translation Team and FarkasAndras,

Thanks very much for your suggestions.

Arabic Translation Team's solution works well in my case! quite a simple solution.

Thanks FarkasAndras for giving a detailed suggestion, though I still haven't time to try your link, but I believe it will be a good fix for dealing with large size files. Yet, regarding your second suggestion by opening the xml files directly via "Open with>browser" command, it seems it doesn't work in my case. The firefox just shows all the tags, while IE simply cannot open the xml file. Maybe I mess up something?

Anyway, thank you all for your contributions.
Collapse


 
Dawid Wietrzyk
Dawid Wietrzyk  Identity Verified
Poland
Local time: 05:34
Polish to English
+ ...
It works.. Jan 23, 2012

FarkasAndras - I know the post is kind of old, but your script worked perfect for me, just what I needed. Thank you.

 
FarkasAndras
FarkasAndras  Identity Verified
Local time: 05:34
English to Hungarian
+ ...
You're welcome Jan 24, 2012

Dawid Wietrzyk wrote:

FarkasAndras - I know the post is kind of old, but your script worked perfect for me, just what I needed. Thank you.


Glad it worked. Now this script (probably a more refined version) and similar random bits and bobs are in the "grab bag" at http://sourceforge.net/projects/aligner/files/

Currently, the grab bag is at version 1.6. You'll always find the most recent version at the sourceforge url above.

[Edited at 2012-01-24 11:29 GMT]


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

extract text from xml files






CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

Buy now! »
Trados Studio 2022 Freelance
The leading translation software used by over 270,000 translators.

Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop and cloud solution, empowering you to work in the most efficient and cost-effective way.

More info »