Announcement

Collapse
No announcement yet.

Programmers please read.

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Programmers please read.

    I have an annoying problem with my latest piece of research, which is an empirical study (please don't stop reading - I'm getting to the programming part ).

    I have a main dataset which attaches a postcode (like zip code) district to each individual. I then have other datasets which I will map into the main one. Unfortunately many of the postcodes (which are saved as strings) in the main set have spaces at the start, and since [(space)B1] is not equal to B1 I can't merge my sets.

    Now I have converted the string to an ascii file, and of course I can remove the spaces and reconvert back into my econometric package. But there are about 100,000 offending spaces (stop laughing, it's not funny!). Glancing at the rest of the data it seems there will be further space related problems.

    Attached is a .txt file, with a small sample of the observations on individual lines between quote marks. Essentially I need an automated way to remove all spaces within each set of quote marks, so that "(space)(space)B1" becomes "B1"

    It's such a trivial problem I'm sure a programmer can suggest a solution straight away. I hope so anyway.
    Attached Files

  • #2
    I was going to say 'how are you going to learn if you just want solutions straight up', but then again I realized I might get into a problem I can't solve at all, and I might ask help here, so you know.. I'd be digging my own grave .
    In da butt.
    "Do not worry if others do not understand you. Instead worry if you do not understand others." - Confucius
    THE UNDEFEATED SUPERCITIZEN w:4 t:2 l:1 (DON'T ASK!)
    "God is dead" - Nietzsche. "Nietzsche is dead" - God.

    Comment


    • #3
      Don't most word processors have a "Find and Replace" feature? Would that help at least a little?
      meet the new boss, same as the old boss

      Comment


      • #4
        Post the entire thing with a fixed example line. I'll use Emacs's query string replace to remove the spaces. I just tested on juvos.txt and it worked.
        American by birth, smarter than the average tropical fruit by the grace of Me. -me
        I try not to break the rules but merely to test their elasticity. -- Bill Veeck | Don't listed to the Linux Satanist, people. - St. Leo | If patching security holes was the top priority of any of us(no matter the OS), we'd do nothing else. - Me, in a tired and accidental attempt to draw fire from all three sides.
        Posted with Mozilla Firebird running under Sawfish on a Slackware Linux install.:p
        XGalaga.

        Comment


        • #5
          Find and replace I did think of, but there are still 3,000 different postcodes in each set. Geeslaka's comment looks more promising.

          Geeslaka: As I said the dataset is huge (even the text file with all the observation on this variable is 2.5MB) so I can't attach it. Also I will face more space related problems with the sets I am mapping in, I am sure.

          Hence can you explain this method of yours? Or is it something a humble economist will not be able to do?

          Comment


          • #6
            Originally posted by mrmitchell
            Don't most word processors have a "Find and Replace" feature? Would that help at least a little?
            That would be the simplest way - just import the file into word, trim out spaces (replace "(space) with " ) and be done with it quicker than you could write a chunk of code to do the same thing.

            If this was going to be a constant problem with many new data sources, and the program had reusability, then I'd code something to clean up those files.
            When all else fails, blame brown people. | Hire a teen, while they still know it all. | Trump-Palin 2016. "You're fired." "I quit."

            Comment


            • #7
              If you wish I can write a short pascal prog to fix it.

              Comment


              • #8
                Originally posted by DrSpike
                Find and replace I did think of, but there are still 3,000 different postcodes in each set. Geeslaka's comment looks more promising.

                Geeslaka: As I said the dataset is huge (even the text file with all the observation on this variable is 2.5MB) so I can't attach it. Also I will face more space related problems with the sets I am mapping in, I am sure.

                Hence can you explain this method of yours? Or is it something a humble economist will not be able to do?
                What language / DB are you using?
                When all else fails, blame brown people. | Hire a teen, while they still know it all. | Trump-Palin 2016. "You're fired." "I quit."

                Comment


                • #9
                  Or one could use emacs if he has such a thing.

                  Comment


                  • #10
                    M-x query-replace-regexp
                    Is each line simple " B2", or is there more after that? If you want to remove all spaces, not just the leading ones, MtG's suggestion will work.
                    American by birth, smarter than the average tropical fruit by the grace of Me. -me
                    I try not to break the rules but merely to test their elasticity. -- Bill Veeck | Don't listed to the Linux Satanist, people. - St. Leo | If patching security holes was the top priority of any of us(no matter the OS), we'd do nothing else. - Me, in a tired and accidental attempt to draw fire from all three sides.
                    Posted with Mozilla Firebird running under Sawfish on a Slackware Linux install.:p
                    XGalaga.

                    Comment


                    • #11
                      Originally posted by MichaeltheGreat


                      What language / DB are you using?
                      I should reiterate I am an economist/econometrician.

                      What is emacs?

                      Comment


                      • #12
                        Originally posted by geeslaka
                        M-x query-replace-regexp
                        Is each line simple " B2", or is there more after that? If you want to remove all spaces, not just the leading ones, MtG's suggestion will work.
                        Some have 2 leading spaces, and some 1. Other have spaces in the middle, which I may or may not want to remove, I'm not sure yet.

                        Comment


                        • #13
                          it's allows for very advanced and configuarble search and replace queries.

                          But you have to have unix or some unix-like tool to use it.

                          Comment


                          • #14
                            Since you do not have Emacs and it would take longer to install it, email the text file to me with one line fixed. I just emptied my box so it should fit.
                            American by birth, smarter than the average tropical fruit by the grace of Me. -me
                            I try not to break the rules but merely to test their elasticity. -- Bill Veeck | Don't listed to the Linux Satanist, people. - St. Leo | If patching security holes was the top priority of any of us(no matter the OS), we'd do nothing else. - Me, in a tired and accidental attempt to draw fire from all three sides.
                            Posted with Mozilla Firebird running under Sawfish on a Slackware Linux install.:p
                            XGalaga.

                            Comment


                            • #15
                              Geeslaka: Thanks, I will certainly take up that offer if a find and replace will not work. However there will probably be more requests later.

                              Let's assume for a second I want to remove all the spaces, no matter where they are. It has been suggested that MtG's idea would work in this case, but how do you remove the spaces with one find and replace?

                              Example:

                              "(space)(space)B1" must become "B1"
                              "(space)B12" must become "B12"

                              There may be spaces in the second/third position or both as well for some codes.

                              I can see how to replace "(space)(space)B1" with "B1" but I have thousands of finds and replace to do in that case, right?

                              Comment

                              Working...