Announcement

Collapse
No announcement yet.

****ing 2 Gig file size limit

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    In my case we have terabytes of data.

    And yeah, it is relevant.

    JM
    Jon Miller-
    I AM.CANADIAN
    GENERATION 35: The first time you see this, copy it into your sig on any forum and add 1 to the generation. Social experiment.

    Comment


    • #32
      Originally posted by BlackCat
      A wild guess from me is that you are working on some four dimentional data where there will be a lot of zeroes.
      No. It is a large sample of statistically independent events in an extremely complex probability space (dimensionality indefinite but on the order of ~3000)
      12-17-10 Mohamed Bouazizi NEVER FORGET
      Stadtluft Macht Frei
      Killing it is the new killing it
      Ultima Ratio Regum

      Comment


      • #33
        Originally posted by Jon Miller
        In my case we have terabytes of data.

        And yeah, it is relevant.

        JM
        I could easily generate TB of data. I have done some clever things to cut down significantly.
        12-17-10 Mohamed Bouazizi NEVER FORGET
        Stadtluft Macht Frei
        Killing it is the new killing it
        Ultima Ratio Regum

        Comment


        • #34
          Originally posted by KrazyHorse
          you, you piece of
          LOVE. IT.
          Tutto nel mondo è burla

          Comment


          • #35
            My Outlook file at work hits 10GB about once a year and I need to swap it out.
            "The issue is there are still many people out there that use religion as a crutch for bigotry and hate. Like Ben."
            Ben Kenobi: "That means I'm doing something right. "

            Comment


            • #36
              Originally posted by Jon Miller
              In my case we have terabytes of data.

              And yeah, it is relevant.

              JM
              I assume that it was an answer to me (pretty quick )

              yeah, it is relevant to know the zero's, but there are no reason to register them one by one - all you need to know is what areas/cubics (dammit, what's the right word ?) - that save space and improves performance. only problem is that it demands an intelligent programmer
              With or without religion, you would have good people doing good things and evil people doing evil things. But for good people to do evil things, that takes religion.

              Steven Weinberg

              Comment


              • #37
                Originally posted by KrazyHorse


                No. It is a large sample of statistically independent events in an extremely complex probability space (dimensionality indefinite but on the order of ~3000)
                The reason you can probe this space at all in a meaningful manner is that most of these degrees of freedom are almost independent of each other (while many others are not).
                12-17-10 Mohamed Bouazizi NEVER FORGET
                Stadtluft Macht Frei
                Killing it is the new killing it
                Ultima Ratio Regum

                Comment


                • #38
                  Originally posted by BlackCat
                  Just out of curiosity - is all those 2+ Gb data relevant ?

                  A wild guess from me is that you are working on some four dimentional data where there will be a lot of zeroes.
                  Meh, I have files considerably bigger than 2GB and I'm not doing anything that unusual. Take fifty thousand respondents, ask them ~100 questions each, and put their answers in a dataset... at 5m distinct cells plus demographic data it can get pretty big pretty fast.

                  (Especially when people do silly things like have 60,000 variables per person due to asking a lot of different questions to each set of people... but that's a different story entirely.)
                  <Reverend> IRC is just multiplayer notepad.
                  I like your SNOOPY POSTER! - While you Wait quote.

                  Comment


                  • #39
                    Originally posted by BlackCat


                    I assume that it was an answer to me (pretty quick )

                    yeah, it is relevant to know the zero's, but there are no reason to register them one by one - all you need to know is what areas/cubics (dammit, what's the right word ?) - that save space and improves performance. only problem is that it demands an intelligent programmer
                    There are all sorts of reasons you might want to keep each individual cell. For example, if you have a ton of variables, the number of potential analyses might well be more than the data itself... and you don't always know all of the potential statistics desired, so usually it's best to keep the original.

                    Now, that said, there are tons of ways to make this more compact; don't know about KH's situation so wouldn't venture to say for him. In my situation, though, as I work for clients, often I have to conform to their preferences, even if they make little sense to me.
                    <Reverend> IRC is just multiplayer notepad.
                    I like your SNOOPY POSTER! - While you Wait quote.

                    Comment


                    • #40
                      People also do silly things like, oh, I don't know, store movies on hard drives.

                      Comment


                      • #41
                        Originally posted by KrazyHorse


                        No. It is a large sample of statistically independent events in an extremely complex probability space (dimensionality indefinite but on the order of ~3000)
                        Sounds interesting - do you have a link that explains what you are doing ? (I rarely goes beyond 2455 in dimensionality)
                        With or without religion, you would have good people doing good things and evil people doing evil things. But for good people to do evil things, that takes religion.

                        Steven Weinberg

                        Comment


                        • #42
                          Originally posted by snoopy369

                          Meh, I have files considerably bigger than 2GB and I'm not doing anything that unusual. Take fifty thousand respondents, ask them ~100 questions each, and put their answers in a dataset... at 5m distinct cells plus demographic data it can get pretty big pretty fast.

                          (Especially when people do silly things like have 60,000 variables per person due to asking a lot of different questions to each set of people... but that's a different story entirely.)


                          Well, I do the same - if you mess with data about all books (well almost) that are written in the world, 2 Gb isn't much. The problem is that I don't have any zeroes - there may be redundant data, but compared to the relevant they are few and rarely worth optimazation.
                          With or without religion, you would have good people doing good things and evil people doing evil things. But for good people to do evil things, that takes religion.

                          Steven Weinberg

                          Comment


                          • #43
                            I bet if you zipped it you'd see tremendous optimization in size.
                            "The issue is there are still many people out there that use religion as a crutch for bigotry and hate. Like Ben."
                            Ben Kenobi: "That means I'm doing something right. "

                            Comment


                            • #44
                              Originally posted by BlackCat


                              Sounds interesting - do you have a link that explains what you are doing ?
                              Not one which would be helpful to you.
                              12-17-10 Mohamed Bouazizi NEVER FORGET
                              Stadtluft Macht Frei
                              Killing it is the new killing it
                              Ultima Ratio Regum

                              Comment


                              • #45
                                Oh, in my case certainly. Unfortunately while actually USING the data, that's not an option...

                                (By optimization, I'm talking about 8GB -> 14MB ... bleck ... )
                                <Reverend> IRC is just multiplayer notepad.
                                I like your SNOOPY POSTER! - While you Wait quote.

                                Comment

                                Working...
                                X