Announcement

Collapse
No announcement yet.

Help! with regular expressions

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help! with regular expressions

    Okay, so the situation here is that I have about 2100 lines of code like this:
    Code:
    (tr)
          (td colspan=2 height="84" class="body")Badgers!(/td)
        (/tr)
    And I need to run a find-and-replace to transform it into code like this:

    Code:
    (p)Badgers!(/p)
    The problem is there's no way to isolate the "Badgers!" string without regular expressions... and I have no clue how to build 'em. Can anyone help?
    cIV list: cheats
    Now watch this drive!

  • #2
    I take it you have the software needed to use them? Or is that what the problem is?

    The following Perl one-liner ought to work, depending on how exact that description of the data really was. It won't work if the stuff between the tags stretches across multiple lines, for instance.

    Code:
    perl -0pi'.bak' -e "s/<tr>\n\s*<td.*?>(.*)<\/td>\n\s*<\/tr>/<p>$1<\/p>/gi;" filename
    And if you want to know what all that means...

    The -0pi'.bak' -e are command-line switches:
    • The p will let Perl read a "record", execute the Perl code on it and write the result back.
    • The i'.bak' means that a backup is made of the original file called filename.bak.
    • The 0 makes Perl read in the contents of the entire file at once. To be more specific, it sets the record separator to the null character.
      The default record separator is the newline. But since the pattern stretches across multiple lines, it would never match if it only ever gets part of the pattern at a time.
      If your file is really big, this might not be a good idea.
    • The e means Perl will execute the code mentioned after it on the command-line.
    • The s///gi; is the regular expression. Replacing the part between the first two / with the part between the second two /. The g makes it replace all occurences. the i makes it match case-insensitively.


    So as far as the regular expression is concerned:
    • \n matches a newline
    • \s* matches zero or more whitespace characters
    • .*? matches as few characters as needed (*?) of any (.) character to make the entire expression match.
    • the parenthesised (.*) is the part you need. The $1 in the replacement part corresponds to this.


    Right...
    Civilization II: maps, guides, links, scenarios, patches and utilities (+ Civ2Tech and CivEngineer)

    Comment


    • #3
      Add some mushrooms and a snake, and you'll be fine.
      "Compromises are not always good things. If one guy wants to drill a five-inch hole in the bottom of your life boat, and the other person doesn't, a compromise of a two-inch hole is still stupid." - chegitz guevara
      "Bill3000: The United Demesos? Boy, I was young and stupid back then.
      Jasonian22: Bill, you are STILL young and stupid."

      "is it normal to imaginne dartrh vader and myself in a tjhreee way with some hot chick? i'ts always been my fantasy" - Dis

      Comment


      • #4
        Thanks, I'll see if I can't fudge that into working.

        I'm using a text editor that accepts regular expressions for searches, find & replace. But I'm willing to run the file through whatever to lessen my workload
        cIV list: cheats
        Now watch this drive!

        Comment


        • #5
          As long as it properly supports regular expressions, that text editor should work. Just use the two parts of the regular expressions
          Civilization II: maps, guides, links, scenarios, patches and utilities (+ Civ2Tech and CivEngineer)

          Comment


          • #6
            You may be over-engineering this.

            You can just replace (tr) with nothing, (/tr) with nothing, (td colspan=2 height="84" class="body") with (p), and (/td) with (/p).
            Blog | Civ2 Scenario League | leo.petr at gmail.com

            Comment


            • #7
              I thought of that solution too but the problem is there are other (td)s that need to be marked up as (/address)
              cIV list: cheats
              Now watch this drive!

              Comment


              • #8
                I would look for something like:

                bol, followed by zero or more white spaces, followed by <tr>, followed by zero or more white spaces and then eol, then bol, zero or more white spaces, <td colspan=2 height="84" class="body"><Badgers!</td>, zero or more white spaces, eol, bol, zero or more white spaces, </tr>,zero or more white spaces, eol
                I may have missed a number of "zero or more white spaces" somewhere.
                (\__/) 07/07/1937 - Never forget
                (='.'=) "Claims demand evidence; extraordinary claims demand extraordinary evidence." -- Carl Sagan
                (")_(") "Starting the fire from within."

                Comment


                • #9
                  Help! with regular expressions
                  <------------- Happy expression

                  <------------ Sad expression

                  <---------------- Mad expression
                  Captain of Team Apolyton - ISDG 2012

                  When I was younger I thought curfews were silly, but now as the daughter of a young woman, I appreciate them. - Rah

                  Comment

                  Working...
                  X