0

My excel file looks like this:

             X_0          X_1  y
0    2917.211242  3289.522533  0
1    1888.937716   781.528356  0
2    4188.521414  1554.476261  0
3    8145.555339  9804.066728  0
4    9584.488981  6176.337189  0

I'm trying to read the data to look like this (i.e. each row in a list within a big list, don't mind the actual entries in the example):

[[3.393533211,2.331273381,0],
    [3.110073483,1.781539638,0],
    [1.343808831,3.368360954,0],
    [3.582294042,4.67917911,0],
    [2.280362439,2.866990263,0],
    [7.423436942,4.696522875,1],
    [5.745051997,3.533989803,1],
    [9.172168622,2.511101045,1],
    [7.792783481,3.424088941,1],
    [7.939820817,0.791637231,1]]

What I've tried so far is this:

import pandas as pd

df = pd.read_excel("nameOfMyDatabase.xlsx")
list1 = list(df['X_0'])
list2 = list(df['X_1'])
list3 = list(df['y'])

print(list1)
print(list2)
print(list3)

And this is the result I'm getting:

[2917.2112422149, 1888.93771568063, 4188.52141357231, 8145.55533888016, 9584.48898099062, 4039.63375666484, ....]

[3289.52253268053, 781.528355771809, 1554.47626074928, 9804.06672844197, 6176.33718886522, 167.607142550757, ....]

[0, 0, 0, 0, 0, 0, ....]

So not quite what i want yet but trial and error is not working for me. Any ideas?

Community
  • 1
  • 1
  • Have you tried making a 2-dimensional list from the 3 lists you made? Also, have you examined the contents of `df` itself? As an aside, please do not share information as images unless absolutely necessary. See: https://meta.stackoverflow.com/questions/303812/discourage-screenshots-of-code-and-or-errors, https://idownvotedbecau.se/imageofcode, https://idownvotedbecau.se/imageofanexception/. – AMC Apr 14 '20 at 18:16
  • yeah did both. df itself is just the excel file with the names of the columns and looping on the lists to merge them seemed very inefficient as the dataset is decent in size and i actually couldn't manage to get the for loop to do it. All my attempts have failed –  Apr 14 '20 at 18:17
  • @AMC I think the image is relevant in this case but thanks for sharing, i will make to consider not adding them next time ;) –  Apr 14 '20 at 18:20
  • I didn't say it's irrelevant, the data could and should also be included as text. _looping on the lists to merge them seemed very inefficient as the dataset is decent in size and i actually couldn't manage to get the for loop to do it._ Are you not using a loop to create the lists in the first place? – AMC Apr 14 '20 at 18:21
  • Does this answer your question? [Pandas DataFrame to List of Lists](https://stackoverflow.com/questions/28006793/pandas-dataframe-to-list-of-lists) – AMC Apr 14 '20 at 18:23
  • @AMC could, sure. I don't per se agree to should as i prefer a picture over text every day on the week but maybe i will change my mind once i read the links you shared. –  Apr 14 '20 at 18:23
  • _I don't per se agree to should as i prefer a picture over text every day on the week_ It hinders accessibility, amongst other things. By the way, why do you even want to make a list of lists? Why not use the DataFrame? – AMC Apr 14 '20 at 18:24
  • @AMC I agree with you ;) –  Apr 14 '20 at 18:34

4 Answers4

1

If you just want to parse data from Excel and don't care about using the pandas DataFrame itself, you should be using openpyxl.

For example, for a spreadsheet like this:

enter image description here

and assuming that your desired spreadsheet is the first/only sheet in the workbook, you could do:

import openpyxl

wb = openpyxl.load_workbook("nameOfMyDatabase.xlsx")
ws = wb.active
cell_range = 'A2:C5'
data = [[cell.value for cell in row] for row in ws[cell_range]]
print(data)
# output: [[1, 5, 9], [2, 6, 10], [3, 7, 11], [4, 8, 12]]
jfaccioni
  • 7,099
  • 1
  • 9
  • 25
0

You could try using

big_list = []
for a, b in zip(df["x_0"], df["x_0"]):
   list = []
   list.append(a)
   list.append(b)
   big_list = big_list.append(list)
S.Ale
  • 94
  • 1
  • 5
0

You can try:

import pandas as pd

df = pd.read_excel("nameOfMyDatabase.xlsx")

data = df.values.tolist()
chmod777
  • 41
  • 2
0

This is my suggestion using numpy and openpyxl:

import openpyxl
import numpy as np

wb = openpyxl.load_workbook('test.xlsx')
ws = wb['Sheet1']

lst = []
for v in ws.values:
    lst.append(v)
print(lst)

values =  = np.array(lst)
Henrique Branco
  • 1,778
  • 1
  • 13
  • 40