Offline tool to compare two word lists
Thread poster: Hans Lenting
Hans Lenting
Hans Lenting
Netherlands
Member (2006)
German to Dutch
Oct 26, 2019

I'm looking for an offline tool (script, macro, ...) to compare two word lists, either case-sensitive or case-insensitive, and create a third list, containing all words that are present in both compared lists.

Both lists contain exactly one word per line. The higher ASCII range (ä, ß etc.) should be supported.


 
Tony M
Tony M
France
Local time: 20:38
Member
French to English
+ ...
SITE LOCALIZER
Excel? Oct 26, 2019

Could you do it going via Excel?

Somthing like IF (value in Column A) = (value in Column B), THEN Column C = (value in Column A), ELSE [0]

Then when you copy back to word, it would be easy enough to sort the table on Column C, and manually remove all the lines where C is empty, finally resorting alphabetically on (say) C if that's important.


 
Jean Dimitriadis
Jean Dimitriadis  Identity Verified
English to French
+ ...
Diff tool Oct 26, 2019

I'd look for a diff tool for text files/directories (Meld, Diffuse, Beyond Compare, etc.) or one that is specifically for Excel (ExcelMerge) if you prefer that route.

 
esperantisto
esperantisto  Identity Verified
Local time: 22:38
Member (2006)
English to Russian
+ ...
SITE LOCALIZER
kdiff3 maybe Oct 26, 2019

https://stackoverflow.com/questions/12826132/from-a-kdiff3-file-comparison-can-i-generate-a-diff-in-unified-diff-format

 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 20:38
Member (2006)
English to Afrikaans
+ ...
Try my little glossary comparison scripts (AutoIt) Oct 26, 2019

Hans Lenting wrote:
I'm looking for an offline tool (script, macro, ...) to compare two word lists, either case-sensitive or case-insensitive, and create a third list, containing all words that are present in both compared lists. ... Both lists contain exactly one word per line. The higher ASCII range (ä, ß etc.) should be supported.


Oh, dear. Well, I may have something that you can use while you search for the perfect solution:
http://www.leuce.com/autoit/WFC%20Glossary%20Comparer.zip

Each of these two scripts attempts to compare two Wordfast Classic glossaries (which are tab-delimited files). I tried to quickly adapt one of them for comparing word lists that contain only 1 column (i.e. your scenario), but I'm afraid I'm too stoned right now.

So, what you need to do, is temporarily replace any existing tabs in your files with a marker, e.g. "|||", and then add a tab to the end of each line (i.e. replace \n with \t\n, or replace CRLF with TAB & CRLF, or whatever), and then use the "compare column 1" script. Also type "NONE" when prompted. The readme file is your friend.

The script outputs two additional files, named after the two original files. If an entry occurs in both files, it gets the word [BOTH] added in front of it. If an entry occurs in one file only, then, well, it just remains in that file.

Look, I used these scripts during a large translation project but did not develop them beyond the point where they were useful to me at the time. These scripts are SLOW with large files, though.


[Edited at 2019-10-26 13:00 GMT]


 
Jean Lachaud
Jean Lachaud  Identity Verified
United States
Local time: 15:38
English to French
+ ...
Excel Oct 26, 2019

Top off my head:

  • Add the content of one list to the other

  • import/copy into an Excel column

  • Sort the column (if required) ([Data Tab | Sort])

  • Remove Duplicates ([Data Tab | Remove Duplicates])


  •  
    Samuel Murray
    Samuel Murray  Identity Verified
    Netherlands
    Local time: 20:38
    Member (2006)
    English to Afrikaans
    + ...
    @Jean Oct 26, 2019

    Jean Lachaud wrote:
  • Add the content of one list to the other

  • Import/copy into an Excel column

  • Remove Duplicates ([Data | Data Tools | Remove Duplicates])


  • If you do this, then you end up with a column that contains all terms.

    The way I understand it, Hans wants only terms that occur in both files. If a term occurs only in one file, then he doesn't what that term.

    In other words (if we assume that duplicates (except one instance, of course) were already removed from each list individually), then step #3 should be something like "remove non-duplicates" (i.e. remove all terms that appear only once in the list).


     
    Jean Lachaud
    Jean Lachaud  Identity Verified
    United States
    Local time: 15:38
    English to French
    + ...
    My bad Oct 26, 2019

    You are right.

    Still, I'm pretty sure there is a quick way to do that in Excel, but I don't have time today to research it.



    Samuel Murray wrote:

    If you do this, then you end up with a column that contains all terms.

    The way I understand it, Hans wants only terms that occur in both files. If a term occurs only in one file, then he doesn't what that term.

    In other words (if we assume that duplicates (except one instance, of course) were already removed from each list individually), then step #3 should be something like "remove non-duplicates" (i.e. remove all terms that appear only once in the list).


     
    Samuel Murray
    Samuel Murray  Identity Verified
    Netherlands
    Local time: 20:38
    Member (2006)
    English to Afrikaans
    + ...
    @Hans, here's a superfast one Oct 26, 2019

    Hans Lenting wrote:
    I'm looking for an offline tool [etc.]


    I found an AutoIt script that can do this, cannibalized it a bit, and here you go: http://www.leuce.com/autoit/compare_two_lists.zip

    It's super, super fast. It doesn't sort the files. It creates three files: one with terms that occur only in file 1, one with terms that occur only in file 2, and one with only terms that occur in both files. Note that the script counts all instances of a term in either file as a single term (put differently: so if a term occurs twice in the same file, the script counts it as one term only; put differently: the script removes all duplicates from each file's content before comparing the two files). It leaves the original files intact.



    [Edited at 2019-10-26 15:13 GMT]


     
    Luca Tutino
    Luca Tutino  Identity Verified
    Italy
    Member (2002)
    English to Italian
    + ...
    Just add a couple of variations to Jean solution (case sensitive) Oct 26, 2019

    Before merging the lists your should eliminate any repetition from each list separately, by using the Excel remove duplicates command.

    Then you merge them and sort the merged list as suggested by Jean.

    Now, you can add a formula like this in Cell B2: =identical(A2;A1).

    Copy the cell B2 in the remaining rows of column B, and you automatically get =identical(A3;A2) in Cell B3 and so on. The formula will indicate "True" for the terms appearing twice, which m
    ... See more
    Before merging the lists your should eliminate any repetition from each list separately, by using the Excel remove duplicates command.

    Then you merge them and sort the merged list as suggested by Jean.

    Now, you can add a formula like this in Cell B2: =identical(A2;A1).

    Copy the cell B2 in the remaining rows of column B, and you automatically get =identical(A3;A2) in Cell B3 and so on. The formula will indicate "True" for the terms appearing twice, which means originally appearing in both lists, and false for all the other terms, as well as for the first appearance of the double terms.

    Use the Automatic Filter in column B to select the "True" rows.

    Copy the filtered column A in a new worksheet, and you have your desired list.


    [Edited at 2019-10-26 16:22 GMT]

    [Edited at 2019-10-26 16:24 GMT]
    Collapse


     
    Luca Tutino
    Luca Tutino  Identity Verified
    Italy
    Member (2002)
    English to Italian
    + ...
    Additional step for case insensitive Oct 26, 2019

    Just add the function "=upper(A1)" in B1 and copy Cell B1 in the remaining rows of column B.

    Then proceed as above by referring the "identical" formula to column B rather than column A and placing it in column C rather than column B.


    [Edited at 2019-10-26 16:24 GMT]


     
    Samuel Murray
    Samuel Murray  Identity Verified
    Netherlands
    Local time: 20:38
    Member (2006)
    English to Afrikaans
    + ...
    Note Oct 27, 2019

    Samuel Murray wrote:
    I found an AutoIt script that can do this, cannibalized it a bit, and here you go: http://www.leuce.com/autoit/compare_two_lists.zip


    The script assumes Windows line breaks (CRLF), so if your files have Unix line breaks, try changing CRLF to LF in the script.


     
    Hans Lenting
    Hans Lenting
    Netherlands
    Member (2006)
    German to Dutch
    TOPIC STARTER
    Thank you Oct 27, 2019

    Samuel Murray wrote:

    Samuel Murray wrote:
    I found an AutoIt script that can do this, cannibalized it a bit, and here you go: http://www.leuce.com/autoit/compare_two_lists.zip


    The script assumes Windows line breaks (CRLF), so if your files have Unix line breaks, try changing CRLF to LF in the script.


    Thank you all!

    I've used the second script that Samuel provided. @Samuel, if you can find a case-insensitive solution, I'd be much obliged. @Jean: I'll test the Mac version of Beyond Compare.


     


    To report site rules violations or get help, contact a site moderator:

    Moderator(s) of this forum
    Laureana Pavon[Call to this topic]

    You can also contact site staff by submitting a support request »

    Offline tool to compare two word lists






    TM-Town
    Manage your TMs and Terms ... and boost your translation business

    Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

    More info »
    Trados Studio 2022 Freelance
    The leading translation software used by over 270,000 translators.

    Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop and cloud solution, empowering you to work in the most efficient and cost-effective way.

    More info »