0

I am running a query using psycopg2 in python. The results of the query is saved to a list. I am trying to convert this list into a numpy array and then write to a csv file. Here is how I did that.

rows = rcursor.fetchall()
df = pd.DataFrame(np.array(rows), columns = rows("db1 db2 db3 db4 db5"))
df.to_csv('alldata.csv',sep=',')]

But when I do that i get the error :

ValueError: Must pass 2-d input

I guess I have to apply .reshape but the number of rows is huge (like 200000). The data fetched from query to list looks like this.

RealDictRow([('db1', '0001'), ('db2', 002), ('db3', 003), ('db4', '004'), ('db5', 'Hello I worked on this so far but not happening. Call my number 245-456-7892)

How can I write this to csv properly without getting the ValueError: Must pass 2-d input. Thanks in advance!

Siddharth C
  • 39
  • 1
  • 9
  • Did you copy that code here by typing it? There are soo many mistakes in it. `np.array(list)` should most likely be `np.array(rows)`, `"colname 4"` should most likely be `"colname_4"`. Further, in your last code part, `'Col2_name'` starts with a capital letter, while the other ones don't. Then, `'colname_2'` is suddenly `'col2_name'`. `'col1_value'` and `'col4_value'` are strings, while `col2_value` and `col3_value` are variables. `col5_value` is a string, but doesn't have an end. And those were just the typos. – Finomnis Jul 02 '19 at 19:33
  • Ya edited a bit before typing here. I'll rectify the typos. – Siddharth C Jul 02 '19 at 19:35
  • The only programming error that I see here is that `list("colname_1 colname_2")` is not what you think it is. It is `['c', 'o', 'l', 'n', 'a', 'm', 'e', '_', '1', ' ', 'c', 'o', 'l', 'n', 'a', 'm', 'e', '_', '2']`. I think what you were trying to write is `["colname_1", "colname_2"]`. – Finomnis Jul 02 '19 at 19:35
  • Well, don't type it here, just copy paste it instead. In case you had problems with it not being recognized as code, just write ``` or ~~~ before and after. Refer to [this](https://stackoverflow.com/editing-help#code). – Finomnis Jul 02 '19 at 19:36
  • Are you sure that the return value looks like this, and not `RealDictRow({'col1_name': 'col1_value', 'col2_name': 'col2_value', 'col3_name': 'col3_value', 'col4_name': 'col4_value', 'col5_name': 'col5_value})`? – Finomnis Jul 02 '19 at 19:47
  • The documentation of `fetchone()` says, it doesn't return a RealDictRow, but instead a tuple ... I am sorry, but nothing of what you wrote makes any sense so far ... – Finomnis Jul 02 '19 at 19:52
  • I cant copy paste exactly as it contains some sensitive information. But ya the format is the same. And yes the return value looks exactly like the one I have edited now. – Siddharth C Jul 02 '19 at 19:52
  • My bad. I had written fetchall() but later changed it to fetchone() to try a sample. But ya it is fetchall(), not fetchone(). Sorry! – Siddharth C Jul 02 '19 at 19:54
  • Wow. Please proof-read your code in future before you post it. If you would hand something like this in for a test at school, you would fail horribly, even if everything would be correct. This took way too long until we can actually start working on the problem. – Finomnis Jul 02 '19 at 19:56
  • Yes, I am very sorry! – Siddharth C Jul 02 '19 at 19:57
  • Further question. What does `rows("db1 db2 db3 db4 db5")` mean? Doesn't make any sense to me. – Finomnis Jul 02 '19 at 19:57
  • I got it from here, so just tried. Was specifying column names to write to csv https://stackoverflow.com/questions/42593104/convert-list-into-a-pandas-data-frame – Siddharth C Jul 02 '19 at 20:00
  • No, this question specifically targets the fact that `"abc"` can be converted to `['a', 'b', 'c']` with `list()`. Sure, you can do that too, but it doesn't split the string by `' '`, but every single character. Just create the list explicitely, like this: `["db1", "db2", "db3", "db4", "db5"]`. Or, if you can't change the fact that your format is `"db1 db2 db3 db4 db5"` with spaces in between, split them like this: `"db1 db2 db3 db4 db5".split()`. – Finomnis Jul 02 '19 at 20:04
  • Now, some more questions. If `rcursor.fetchall()` really returns what you claim, then your input table has two columns, the first containing the db's and the second containing the numbers. You then want to write it in a csv. And the way you wrote that, you want the csv columns to be what the rows of your table were. Does that mean, you have one header line in the csv and then one single data line? Where does the 200000 then come from? Either that, or you made horrible mistakes in your simplification. – Finomnis Jul 02 '19 at 20:06
  • The 200000 data comes with fetchall. That is the result of the query. I just gave a single rows example. If I run the query on Sql Workbench it will give me a result with 5 columns and 200000 rows. Does that make sense? – Siddharth C Jul 02 '19 at 20:11
  • It does, but why didn't you write that in your question? How should I know that you only posted a single row? ... Well, now that this is out of the way, let's continue. Why do you get a list of pairs for every row? That's because you use 'RealDictCursor' instead of a normal cursor. Which is something that you didn't mention. Why do you do that? Also, you don't get pairs, you get a dict. – Finomnis Jul 02 '19 at 20:23
  • You should really think about whether your knowledge of python already allows you to work with sensitive data ... – Finomnis Jul 02 '19 at 20:29
  • Yes I understand what you are saying. But it is what it is and I am trying to learn which is not wrong. What should I use instead of RealDictCursor? – Siddharth C Jul 02 '19 at 20:55

1 Answers1

0

You can use pandas.read_sql_query directly to convert a sql query + connection into a dataframe. See here for example.

Karthik V
  • 1,867
  • 1
  • 16
  • 23