0

I'm trying to format some data once it's printed to the console in Python (I'm also using pandas if that helps.) Here's what I'm just trying to align vertically:

print("CensusTract State County Races")
for index, row in df.iterrows():
    if row['Income'] >= 50000:
        if row['Poverty'] > 50:
            print(row['CensusTract'], row['State'], row['County'], end=" ")
            if row['Hispanic'] > 1:
                print("Hispanic:", row['Hispanic'], end=" ")
            if row['White'] > 1:
                print("White:", row['White'], end=" ")

etc. (ends with \n)

currently this code prints:

CensusTract State County Races
12071080100 Florida Lee Hispanic: 4.5 White: 74.7 Black: 20.8 
13121003500 Georgia Fulton Hispanic: 4.8 White: 32.4 Black: 57.9 Asian: 1.1 
15003008611 Hawaii Honolulu Hispanic: 9.7 White: 26.6 Asian: 2.4 Pacific: 51.6 
17097863003 Illinois Lake Hispanic: 12.9 White: 61.5 Black: 13.4 Asian: 5.0 
34023005100 New Jersey Middlesex Hispanic: 8.3 White: 60.4 Black: 7.3 Asian: 22.1 
36119981000 New York Westchester Hispanic: 19.2 White: 30.4 Black: 29.6 Asian: 19.9 
40109103602 Oklahoma Oklahoma Hispanic: 3.3 White: 60.0 Black: 29.3 

Compared to what I want:

CensusTract State      County         Races
12071080100 Florida    Lee            Hispanic: 4.5 White: 74.7 Black: 20.8 
13121003500 Georgia    Fulton         Hispanic: 4.8 White: 32.4 Black: 57.9 Asian: 1.1 
15003008611 Hawaii     Honolulu       Hispanic: 9.7 White: 26.6 Asian: 2.4 Pacific: 51.6 
17097863003 Illinois   Lake           Hispanic: 12.9 White: 61.5 Black: 13.4 Asian: 5.0 
34023005100 New Jersey Middlesex      Hispanic: 8.3 White: 60.4 Black: 7.3 Asian: 22.1 
36119981000 New York   Westchester    Hispanic: 19.2 White: 30.4 Black: 29.6 Asian: 19.9 
40109103602 Oklahoma   Oklahoma       Hispanic: 3.3 White: 60.0 Black: 29.3 
Emma E
  • 3
  • 3
  • If it's a pandas dataframe you can just look at it with df.head() or print(df) and it will be formatted nicely. – Chowlett2 Sep 18 '22 at 05:25
  • the df itself has many other columns, I'm only printing up to 9 of them. I can try to put the data I want into a new df and print that though! This is also how my professor wants it formatted so I don't want to go too far off from this. – Emma E Sep 18 '22 at 05:30
  • Can you either link us to the data or provide a small sample of it so I can see what it looks like and provide solution? – Chowlett2 Sep 18 '22 at 05:34
  • https://www.kaggle.com/datasets/muonneutrino/us-census-demographic-data I'm using the 2015 census tract file. – Emma E Sep 18 '22 at 05:36
  • Please provide the output of `df.head().to_dict('list')` in your question – mozway Sep 18 '22 at 05:46
  • `.rjust(maxWidth)` or ljust() https://docs.python.org/3/tutorial/inputoutput.html#manual-string-formatting seem to fit best into your code structure. – Claudio Sep 18 '22 at 06:27

2 Answers2

0

You can save some trouble with the nested for loops and iterating through rows by writing your queries in pandas.

df[(df['Income'] >= 50000) & (df['Poverty'] > 50)][['CensusTract', 'State', 'County', 'Hispanic', 'White', 'Black', 'Asian', 'Pacific']]

The above code tells pandas to look for rows where the income is >= 50000 AND the poverty is above 50. Then we are telling it to only show us the columns that we want to see with [['column 1', 'column 2']] etc.

This is what you should see after running the line of code:

enter image description here

Chowlett2
  • 128
  • 1
  • 1
  • 8
  • Thank you! I only want to see races that are more than 1.0. Is there a way to format considering this? – Emma E Sep 18 '22 at 05:55
  • @EmmaE You can add another conditional that specifies that certain races must be above 1, but then you'll only get back a row if all of the conditionals are met. Is there a reason why you would want to see missing values in a cell for a given row and column? If you were to look at the top row in the output above, Asian & Pacific and below 1, but then the others are above. Then if you look at the 3rd row, Asian & Pacific are above 1, but Black is below, do you see why that would create a structural issue? – Chowlett2 Sep 18 '22 at 05:58
  • It's just what the assignment I'm working on is requiring. I've figured out how to find the data I need (the above nested for loops helps) but just need to align the data so it's readable. I used something like %4d in Java for a different project and didn't know if there was an equivalent for python. (I'm teaching myself python as I go, sorry if this is a simple question!) – Emma E Sep 18 '22 at 06:02
  • @EmmaE I'm not sure why that is the desired output, but if you want to print out multiple lines and have them aligned the same way, you can use ".format()" Here is a thread on that: https://stackoverflow.com/questions/8234445/format-output-string-right-alignment – Chowlett2 Sep 18 '22 at 06:10
  • 1
    Thank you, that thread had what I needed. I appreciate your help! and yes I agree it's an odd way to report the data and it results in a structural issue if I wanted to print in the dataframe. – Emma E Sep 18 '22 at 06:20
0

Formatting with f-strings or with usage of .rjust() ( see: https://docs.python.org/3/tutorial/inputoutput.html#manual-string-formatting ) gives you full flexibility about how much columns you have printed. This can't be done by a Pandas table which must provide all the columns for printout:

print("CensusTract State County Races")
maxState    = len(maxLenColumnStateStrRepresentation)
maxCounty   = len(maxLenColumnCountyStrRepresentation)
# ... with Hispanic, White, Black == 10 for label and 4 for number 
for index, row in df.iterrows():
    if row['Income'] >= 50000:
        if row['Poverty'] > 50:
            print(f'{row['CensusTract']}, {row['State']:maxState}, {row['County':maxCounty]}', end=" ")

            CensusTract State County Races
12071080100 Florida Lee      Hispanic:  4.5 White:    74.7 Black:    20.8 
13121003500 Georgia Fulton   Hispanic:  4.8 White:    32.4 Black:    57.9 Asian:      1.1 
15003008611 Hawaii  Honolulu Hispanic:  9.7 White:    26.6 Asian:     2.4 Pacific:   51.6 

# or

for index, row in df.iterrows():
    if row['Income'] >= 50000:
        if row['Poverty'] > 50:
            print(f'row['CensusTract'], row['State'].rjust(maxState), row['County':maxCounty].rjust(maxCounty)', end=" ")

For optimal result you have to collect all the data to calculate the maxWidth of all the chosen results or take the overall maxWidth of the entire column if you not after minimal possible line width of the printout.

Claudio
  • 7,474
  • 3
  • 18
  • 48
  • I'll try this too, thank you! I appreciate the link too, it'll be helpful as I continue with this class. – Emma E Sep 18 '22 at 06:38