5

I am beginner in python, and I need to use some thirdparty function which basically has one input - name of a file on a hard drive. This function parses file and then proceses it.

I am generating file contents in my code (it's CSV file which I generate from a list) and want to skip actual file creation. Is there any way I can achieve this and "hack" the thirdparty function to accept my string without creating a file?

After some googling I found StringIO, and created a file object in it, now I am stuck on passing this object to a function (again, it accepts not a file object but a file name).

GrayR
  • 1,395
  • 2
  • 19
  • 32
  • 1
    If you have access to the source of the third-party module (quite likely), an alternative solution would be to patch the third-party code and remove the restriction. The ultimate solution, of course, would be to write the third-party developer and request that they do this for you (and for the benefit of everyone else.) – Li-aung Yip Apr 02 '12 at 02:11
  • What kind of processing does it do? There is already a built-in standard library module for basic CSV handling. – Karl Knechtel Apr 02 '12 at 02:25
  • @Karl Knechtel: it does a lot of complicated computations using several machine learning algorithms, based on data from csv. – GrayR Apr 02 '12 at 02:30
  • @GrayR: So you do or you don't have access to the source of the 3rd party module? – Joel Cornett Apr 02 '12 at 05:21

3 Answers3

8

It looks like you'll need to write your data to a file then pass the name of that file to the 3rd party library. You might want to consider using the tempfile module to create the file in a safe and easy way.

Whatang
  • 9,938
  • 2
  • 22
  • 24
  • Yes its the easy solution, but then my program will write several 100kB files per second. tempfile is a nice advice thanks, but still it creates a file :( – GrayR Apr 02 '12 at 01:44
6

If it requires a filename, then you're going to have to create a file. (And that's poor design on the part of the library creators.)

Amber
  • 507,862
  • 82
  • 626
  • 550
  • 1
    it uses file's name, not actual file. – GrayR Apr 02 '12 at 01:20
  • Thank for help. Bad news for me I think. Also found thing called pywinfuse, but it is considered to be slow accordingly to reviews. – GrayR Apr 02 '12 at 01:49
  • 3
    If you have a **lot** of data to process, or it needs to be **really** fast, you could create a RAM disk and create your files entirely in memory. (hint: unless execution time > 1 hour, or it has to be real-time, it *doesn't* need to be that fast.) – Li-aung Yip Apr 02 '12 at 02:08
  • @Li-aung Yip both conditions from your hint are valid. Thanks for your reply, will try and look into PyFilesystem as suggested here [link](http://stackoverflow.com/questions/4351048/how-can-i-create-a-ramdisk-in-python) – GrayR Apr 02 '12 at 02:28
  • 2
    You found the same links as me, which means you must have tried the same Google search. ;) Note however that the OS does do some caching for disk I/O, and using a RAM disk may actually defeat this. If you do end up using a RAM disk, benchmark your code before and after to make sure it actually made things faster (instead of slower.) – Li-aung Yip Apr 02 '12 at 04:44
-2

You should look into the python docs for I/O, seen here: http://docs.python.org/tutorial/inputoutput.html

Python processes files by opening them, there is no extra file "created". The open file then has a few methods which can be done on them which you can use to create the output you desire; although I'm not entirely sure I understand your wording. What I do understand, you want to open a file, do some stuff with its contents and then create a string of some kind, right? If that's correct, you're in luck, as its pretty easy to do that.

Comma Seperated Values passed into python from a file is extremely easy to parse into python-friendly formats such as lists, tuples and dictionaries.

As you've said, you want a function that you input the name of a file, the file is looked up, read and some stuff is done without the creation of extra files. Alright, so to do that, your code would look like this:

def file_open(filename):
    new_dictionary = {}
    f = open(/directory/filename, r) ##The second param is mode, here readable
    for line in f: ##iterating through each comma seperated value
        key,value = line.split(',') ##set the first entry before comma to key then val
        new_dictionary[key] = value ##set the new_dictionary key to value
    return new_dictionary ##spit that newly assembled dictionary back to us
    f.close() ##Now close the file.

As you can see, there is no other file being created in this process. We just open the file on the hard drive, do some parsing to create our dictionary, and then return the dictionary for use. To set something to the dictionary that it outputs, just set a variable to the function. Just make sure you set the directory correctly, from the root of the hard drive.

CSV_dictionary = file_open(my_file) ##This sets CSV with all the info.

I hope this was helpful, if I'm not getting your problem, just answer and I'll try to help you.

-Joseph

  • 2
    -1 for two reasons: 1) You haven't actually answered the OP's question - he already knows how to open files, but the third-party function he is calling will not accept file handles (only file names.) 2) Your example CSV processing code will break for commas in the middle of string literals - `"Jones, Julie"` would be read as two fields `"Jones` and `Julie"`, which is wrong. Instead, use the built-in `csv` module which solves this problem (and more.) – Li-aung Yip Apr 02 '12 at 02:18
  • 1
    yikes Li, I was just trying to help. If you know the solution, why don't you post an answer? The reason why two fields are returned are so that one can be used for key, the other for value, and it's all private to the function. There's nothing 'wrong' about what I did. Feel free to post an answer up about the specifics of how to use CSV to solve OP's problem and prove me wrong, it'd be a lot more helpful than downvoting a noobie's reputation. – Joseph Daniels Apr 02 '12 at 03:15
  • 3
    We're all on the same side here. It's good that you're enthusiastic about contributing, but you *do* actually have to read the OP's question and address the *specific issue* he is having. As it stands your answer would be helpful for a different question, but not this one. – Li-aung Yip Apr 02 '12 at 04:33
  • 4
    Technical things: 1) the mode argument to `open()` needs to be quoted, `'r'` not `r`. 2) Splitting a CSV row into `key, value` on the first comma you find makes sense for CSV files with exactly two fields per line, but not so much sense for CSV files with many fields per line. My CSV files usually look something like `"Generator U22","Gas Turbine","PQ Mode",1.00,1.05,20,24,36`. It makes more sense to split this into 8 fields, and this is what the `csv` module will do. 3) Finally, note that your CSV parser assumes no duplicate keys, which may or may not be a good idea based on your data. – Li-aung Yip Apr 02 '12 at 04:37