Linux - Want To Check For Possible Duplicate Directories (Probably RegEx Needed)


Linux - Want To Check For Possible Duplicate Directories (Probably RegEx Needed)



I have a directory which contains several directories as follows:

/Music/
/Music/JoeBlogs-Back_In_Black-1980
/Music/JoeBlogs-Back_In_Black-(Remastered)-2003
/Music/JoeBlogs-Back_In_Black-(ReIssue)-1987
/Music/JoeBlogs-Thunder_Man-1947

I want a script to go through and tell me when there are 'possible' duplicates, in the example above it would pick up the following as possible duplicates from the directory list:

/Music/JoeBlogs-Back_In_Black-1980
/Music/JoeBlogs-Back_In_Black-(Remastered)-2003
/Music/JoeBlogs-Back_In_Black-(ReIssue)-1987

1) Is this possible?
2) If so please help!




How can I implement an bad list in a WebService using PHP

1:



Difference between libcurl and libsoup [closed]
Follow up:.
Talk to VM through host operating system
I did what I need using by coding the the following Perl script.


Authenticating from Java (Linux) to Active Directory using LDAP WITHOUT servername
This is my first ever Perl script (and I had to learn Perl to write it - so don't be to hard on me :).
Test local user login data
#!/usr/bin/perl  # README #  # Checks a folder for Albums that are similar  # eg :  # Arist-Back_In_Black-(Remastered)-2001-XXX # Artist-Back_In_Black-(Reissue)-2000-YYY # # Script prompts you for which one to "zz" (putting zz in front of the file name you can delete it later) # # CONFIG #  # Put your mp3 directory path in the $mp3dirpath variable #  $mp3dirpath = '/data/downloads/MP3';  # END CONFIG   @txt= qx{ls $mp3dirpath};   sort (@txt);  $re1='.*?';  $re2='(?:[a-z][a-z0-9_]*)'; $re3='.*?'; $re4='((?:[a-z][a-z0-9_]*))';  $re=$re1.$re2.$re3.$re4;  $foreach_count_before=0; #Setups up counter $foreach_count_after=1; #Setups up counter   $number_in_arry = scalar (@txt);  while ($foreach_count_before < $number_in_arry) {                                         if ($txt[$foreach_count_before] =~ m/$re/is)                                             {                                               $var1=$1;                                              }                                          if ($txt[$foreach_count_after] =~ m/$re/is)                                             {                                               $var2=$1;                                              }                                          if ($var1 eq $var2)                                             {                                              print "-------------------------------------\n";                                              print "$txt[$foreach_count_before] \n";                                              print "MATCHES \n";                                              print "\n$txt[$foreach_count_after] \n";                                              print "Which Should I Remove? \n";                                              print "[1] $txt[$foreach_count_before]\n";                                              print "[2] $txt[$foreach_count_after]\n";                                              print "[Any Other Key] Take No Action\n\n";                                               $answer = <>;        # Get user input, assign it to the variable                                                  if    ( $answer == "1" ) {                                                        print "ZZing $txt[$foreach_count_before]";                                                       $originalfilename = $mp3dirpath . 


Efficient filesystem searching
'/' .


Segmentation fault while embedding python in ubuntu
$txt[$foreach_count_before]; $newfilename = $mp3dirpath .


Help setting up sphinx
'/' .

'zz' .

$txt[$foreach_count_before]; $originalfilename = trim($originalfilename); $newfilename = trim($newfilename); qx(mv $originalfilename $newfilename); } elsif ( $answer == "2" ) { print "ZZing $txt[$foreach_count_after]"; $originalfilename = $mp3dirpath .

'/' .

$txt[$foreach_count_after]; $newfilename = $mp3dirpath .

'/' .

'zz' .

$txt[$foreach_count_after]; $originalfilename = trim($originalfilename); $newfilename = trim($newfilename); print "mv $originalfilename $newfilename"; qx(mv $originalfilename $newfilename); } else { print "Taking No Action"; } } $foreach_count_before++; $foreach_count_after++; } # SubRoutine For Trimming White Space from variables sub trim($) { my $string = shift; $string =~ s/^\s+//; $string =~ s/\s+$//; return $string; }


2:


If your directory names follow a regular structure such as:.
foo-Name_of_Interest-bar 
then you can do a simple regex to strip off the "foo-" and the "-bar" and do a direct comparison.. If that's not possible, you'll have to do a much more expensive pattern match algorithm.

Perhaps something like longest common sequence or Levenshtein distance.

There may be other techniques that are more appropriate.. Simple matching in Bash (version 3.2 or higher) might look like this snippet:.
dir='/Music/JoeBlogs-Back_In_Black-(Remastered)-2003' regex='^([^-]*)-([^-]*)-(.*)$' if [[ ${BASH_REMATCH[1]} == ${prev_dir[1]} &&    #  "/Music/JoeBlogs"       ${BASH_REMATCH[2]} == ${prev_dir[2]} ]]    #  "Back_In_Black" then     echo "we have a match" fi 
This snippet doesn't show a find ...

| while read ...
loop or how previous entries and lists of matches could be handled..



74 out of 100 based on 64 user ratings 1214 reviews