Python Pandas Data not aligned correctly

Question

I am building a dataframe from a directory of text files that contain memory readings. I am giving the column the name Memory.

But when the data gets imported there is a column of zeros, the column with the memory readings that I want, and the Memory column has an NaN by each entry (not a number, I presume):

output:

***Memory Data Frame:
            0 Memory
0   1843260.0    NaN
0   7706164.0    NaN
0   7904828.0    NaN
0   7706164.0    NaN
0   7706172.0    NaN
0   7648524.0    NaN
0   7648524.0    NaN
0   7706172.0    NaN
0   7706164.0    NaN
0   7904828.0    NaN
0   7706172.0    NaN
0   7648524.0    NaN
0   7706172.0    NaN
0  16075888.0    NaN
0   7904672.0    NaN
0   7904680.0    NaN
0   7904672.0    NaN
0   7904680.0    NaN
0  16075880.0    NaN
0   7904672.0    NaN
***

I'm not sure why the data is misaligned with a row of all zeros, the memory readings are floats with a trailing .0, or why there's a row of NaN in the Memory column. Here is my most recent code.

code:

# Create the memory dataframe
column_names = ["Memory"]
memory_df = pd.DataFrame(columns = column_names)
memory_df.astype('int32').dtypes
temp_df = pd.DataFrame(columns = column_names)
temp_df.astype('int32').dtypes
print(f"Reading text files into the Memory DF")
for filename in filelist:
    print(f"Adding filename: {filename}")
    filename = text_path + filename
    temp_df = pd.read_csv(filename, delim_whitespace=True, header=None)
    temp_df.astype('int32').dtypes      
    memory_df = memory_df.append(temp_df)

How can I ingest the data with JUST one Memory column with the memory readings shown as integers with no trailing .0?

There is probably a mismatch between column names of memory_df and temp_df: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.append.html?highlight=append#pandas.DataFrame.append — Jonas, May 31 '20 at 22:36
I've updated the OP to give the two dataframes (temp_df and memory_df) the same column names. I've also tried setting the data type according to what someone responded with below. There is absolutely no change in the output with this change. — bluethundr, May 31 '20 at 22:59
When pd.read_csv is executed, it returns a completely new object, you have to set the column names after it is created: example ` temp_df.columns = ['Memory']` — Jonas, May 31 '20 at 23:08
Thanks! That gets everything lined up. But why is there still a column of zeros? Is there any way to get rid of that? https://pastebin.com/5f6sfYLh — bluethundr, May 31 '20 at 23:24
Glad it helped, the 0s are the index: https://stackoverflow.com/questions/24644656/how-to-print-pandas-dataframe-without-index — Jonas, May 31 '20 at 23:35

MarianD · Accepted Answer · 2020-05-31T23:48:10.900

I don't see the structure of your .csv files, but I suppose from your output that the consist of 1 column of (integer) numbers.

I deleted all useless rows from your code, edited 1 row and appended another 1, so the working code (tested by myself) is

# Create the memory dataframe

column_names = ["Memory"]
memory_df = pd.DataFrame(columns=column_names)
print(f"Reading text files into the Memory DF")
for filename in filelist:
    print(f"Adding filename: {filename}")
    filename = text_path + filename
    temp_df = pd.read_csv(filename, delim_whitespace=True, names=column_names)
    memory_df = memory_df.append(temp_df)

memory_df.Memory = memory_df.Memory.astype("int32")

The resulting memory_df dataframe:

        Memory  
0      1843260 
1      7706164 
2      7904828 
3      7706164 
4      7706172 
5      7648524 
6      7648524 
7      7706172 
8      7706164 
9      7904828 
10     7706172 
11     7648524 
12     7706172 
13    16075888 
14     7904672 
15     7904680 
16     7904672 
17     7904680 
18    16075880 
19     7904672

score 1 · Answer 2 · answered May 31 '20 at 22:31

The all zeroes and misalignment are just a misunderstanding of the form of data by pandas and your computer. To get your desired result, one must extract all the memory values and place them in a new dataframe. To remove the trailing zeroes, change the type to integer. Code to do this below:

memory = df['0']
new_df = pd.Dataframe(memory)
new_df.astype('int32').dtypes

Python Pandas Data not aligned correctly

output:

code:

2 Answers2